NAME
WWW::SpiTract::Extract - Text Extraction Module
SYNOPSIS
use WWW::SpiTract::Extract;
$e = WWW::SpiTract::Extract->new({
TEXT => $t, # webpage text
DESC => $desc->{foo}, # site foo
THISURL => 'http://bazz.buzz.org/', # url of TEXT
});
print Dumper $e->extract;
DESCRIPTION
WWW::SpiTract::Extract extracts data against a given description file.
METHODS
new
$e = new ({
TEXT => 'string parsed by HTML::Tree',
THISURL => 'URL of the text',
DESC => 'data description'
});
extract
$e->extract returns an array of hashes. You may use Data::Dumper to see it
STANDALONES
WWW::SpiTract::Extract::lookup(parsed_text, node_identifier)
WWW::SpiTract::Extract::lookup($t, "0.0.0");
It looks up the given text for the given node identifier, and returns an anonymous hash with entries "tag" and "text".
AUTHOR
xern <xern@cpan.org>
LICENSE
Released under The Artistic License.
SEE ALSO
WWW::SpiTract, WWW::SpiTract::Spider, HTML::TreeBuilder