NAME

XML::Tape::Index - a XMLtape indexer

SYNOPSIS

use XML::Tape::Index qw(:all);

unless (indexexists('ex/tape.xml')) {
    $x = indexopen('ex/tape.xml', 'w');
    $x->reindex;
    $x->indexclose();
}

$x = indexopen('ex/tape.xml', 'r');

for (my $rec = $x->list_identifiers();
     defined($rec);
     $rec = $x->list_identifiers($rec->{token})) {
    print "id     : %s\n" , $rec->{identifier};
    print "date   : %s\n" , $rec->{date};
    print "start  : %s\n" , $rec->{start};
    print "length : %s\n" , $rec->{len};
}

my $rec = $x->get_identifier('oai:arXiv.org:hep-th:0208183');
my $xml = $x->get_record('oai:arXiv.org:hep-th:0208183');

DESCRIPTION

This modules creates an index on XMLtapes to enable fast retrieval of XML documents from the archive. The index files are stored next to the XMLtape.

METHODS

$x = indexopen($tape_file, $flag)

This function opens an index for reading or writing. The parameter tape_file is the location of a XMLtape archive. The flag is "w" when creating a new index or "r" when reading an index. An XML::Tape::Index instance will be returned on success or undef on failure.

$x->reindex()

This method reads the XMLtape extracts all identifier and datestamps from it and stores the byte positions of all records in the index.

$x->list_identifiers([$token])
$x->list_identifiers($from,$until)

Use this method to iterate through the index to return all records. This method returns an index record on success or undef when no more records are available. Each index record is a HASH reference containing the fields 'identifier', 'date', 'start' (the starting byte of the XML document in the XMLtape), 'len' (the length of the XML document in the XMLtape) and 'token'. The 'token' field should be used to return the next index record. One can filter the returned indexed records by using two arguments at the first list_identifiers method invocation. Only index records with dates greater or equal than 'from' and less than 'until' will be returned by subsequent list_identifier requests. E.g.

# Return all index records...
for (my $r = $x->list_identifiers(); 
     defined($r);
     $r = $x->list_identifiers($r->{token}) {
}

# Return all index records with dates between 2000-01-01 and 2005-12-31...
for (my $r = $x->list_identifiers(
            '2001-01-01T00:00:00Z',
            '2005-12-31T23:59:59Z'
                   );
     defined($r);
     $r = $x->list_identifiers($r->{token}) {
}
$x->get_earlist_date()

This methods returns earliest date in the index file

$x->get_tape_file()

This methods returns name of the tape file associated with this index.

$x->get_num_of_records()

This methods returns the number of record in an index.

$x->get_identifier($identifier)

This method returns an index record given an identifier as argument. When no matching index record can be found undef will be returned. The index record is a HASH reference containing the fields 'identifier', 'date', 'start' and 'len' (see above).

$x->get_record($identifier)

This method returns an XML document from the XMLtape given an identifier as argument. When no matching record can be found undef will be returned.

$x->indexclose();

Closes the XMLtape index.

indexexists($tape_file)

This class method returns true when an index on the XMLtape with location $tape_file exists, returns false otherwise.

indexdrop($tape_file)

This class method deletes the index associated with the XMLtape with location $tape_file.

BUGS

The XML::Tape::Index doesn't lock XMLtape before writing. It is possible to
overwrite and index while another process is reading it.

CREDITS

XMLtape archives were developed by the Digital Library Research & Prototyping team at Los Alamos National Laboratory.

SEE ALSO

XML::Tape

AUTHOR

Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>

1 POD Error

The following errors were encountered while parsing the POD:

Around line 387:

You forgot a '=back' before '=head1'