NAME
WARC::Collection - Interface to a group of WARC files
SYNOPSIS
use WARC::Collection;
$collection = assemble WARC::Collection ($index_1, $index_2, ...);
$collection = assemble WARC::Collection from => ($index_1, ...);
$yes_or_no = $collection->searchable( $key );
$record = $collection->search(url => $url, time => $when);
@records = $collection->search(url => $url, time => $when);
DESCRIPTION
The WARC::Collection
class is the primary means by which user code is expected to use the WARC library. This class uses indexes to efficiently search for records in one or more WARC files.
Search Keys
The search
method accepts a list of parameters as key => value pairs with each pair narrowing the search, sorting the results, or both, indicated in the following list with "[N ]
", "[ S]
", or "[NS]
", respectively.
Supplying an array reference as a value indicates a search where any of the values in the array are acceptable. This does not affect sorting.
The same search keys documented here are used for searching indexes, since WARC::Collection
is a wrapper around one or more indexes, but index support modules do not sort their results. Only WARC::Collection
sorts the returned entries, so keys listed below as "sort-only" are ignored by the index support modules.
The keys supported are:
[N ]
url-
An exact match for a URL.
[NS]
url_prefix-
A prefix match for a URL. Prefers records with shorter URLs.
[ S]
time-
Prefer records collected nearer to the requested time.
[N ]
record_id-
An exact match for a (presumably unique) WARC-Record-ID.
[N ]
segment_origin_id-
Exact match for continuation records for a WARC-Record-ID that identifies a logical record stored using WARC record segmentation. Searching on this key returns only the continuation records.
Methods
- $collection = assemble WARC::Collection ($index_1, $index_2, ...);
- $collection = assemble WARC::Collection from => ($index_1, ...);
-
Assemble a collection of WARC files from one index or multiple indexes, specified either as objects derived from
WARC::Index
or filenames.While multiple indexes can be used in a collection, note that searching a collection requires individually searching every index in the collection.
- $yes_or_no = $collection->searchable( $key )
-
Return true or false to reflect if any index in the collection can search for the requested key.
- $record = $collection->search( ... )
- @records = $collection->search( ... )
-
Search the indexes for records matching the parameters and return the best match in scalar context or a list of all matches in list context. The returned values are
WARC::Record
objects.See "Search Keys" for more information about the parameters.
AUTHOR
Jacob Bachmeyer, <jcb@cpan.org>
SEE ALSO
COPYRIGHT AND LICENSE
Copyright (C) 2019 by Jacob Bachmeyer
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.