NAME
WWW::ContentRetrieval::Extract - Content Extractor
SYNOPSIS
use WWW::ContentRetrieval::Extract;
$e = WWW::ContentRetrieval::Extract->new({
TEXT => $t, # webpage text
DESC => $desc->{foo}, # site foo
THISURL => 'http://bazz.buzz.org/', # url of TEXT
});
print Dumper $e->extract;
DESCRIPTION
WWW::ContentRetrieval::Extract extracts data according to a given description file.
METHODS
new
$e = new ({
TEXT => page's content,
THISURL => URL of the text,
DESC => data description
});
extract
$e->extract returns an array of hashes. You may use Data::Dumper to see it
STANDALONES
WWW::ContentRetrieval::Extract::lookup( text, node_identifier )
WWW::ContentRetrieval::Extract::lookup( WWW::ContentRetrieval::bldTree($t), "0.0.0");
It looks up the given text for the some node identifier, and returns an anonymous hash with entries "tag" and "text".
SEE ALSO
WWW::ContentRetrieval, WWW::ContentRetrieval::Spider, HTML::TreeBuilder
COPYRIGHT
xern <xern@cpan.org>
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.