NAME

WARC::Collection - Interface to a group of WARC files

SYNOPSIS

use WARC::Collection;

$collection = assemble WARC::Collection ($index_1, $index_2, ...);
$collection = assemble WARC::Collection from => ($index_1, ...);

$record = $collection->search(url => $url, time => $when);

DESCRIPTION

The WARC::Collection class is the primary means by which user code is expected to use the WARC library. This class uses indexes to efficiently search for records in one or more WARC files.

Search Keys

The search method accepts a list of parameters as key => value pairs with each pair narrowing the search, sorting the results, or both, indicated in the following list with "[N ]", "[ S]", or "[NS]", respectively.

The same search keys documented here are used for searching indexes, since WARC::Collection is a wrapper around one or more indexes.

The keys supported are:

[N ] url

An exact match for a URL.

[NS] url_prefix

A prefix match for a URL. Prefers records with shorter URLs.

[ S] time

Prefer records collected nearer to the requested time.

Methods

$collection = assemble WARC::Collection ($index_1, $index_2, ...);
$collection = assemble WARC::Collection from => ($index_1, ...);

Assemble a collection of WARC files from one index or multiple indexes, specified either as objects derived from WARC::Index or filenames.

While multiple indexes can be used in a collection, note that searching a collection requires individually searching every index in the collection.

$record = $collection->search( ... )
@records = $collection->search( ... )

Search the index for records matching the parameters and return the best match in scalar context or a list of all matches in list context. The returned values are WARC::Record objects.

See "Search Keys" for more information about the parameters.

AUTHOR

Jacob Bachmeyer, <jcb@cpan.org>

SEE ALSO

WARC

COPYRIGHT AND LICENSE

Copyright (C) 2019 by Jacob Bachmeyer

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.