NAME

WWW::SpiTract::Extract - Text Extraction Module

SYNOPSIS

 use WWW::SpiTract::Extract;

 $e = WWW::SpiTract::Extract->new({
     TEXT    => $t,                      # webpage text
     DESC    => $desc->{foo},            # site foo
     THISURL => 'http://bazz.buzz.org/', # url of TEXT
 });

 print Dumper $e->extract;

DESCRIPTION

WWW::SpiTract::Extract extracts data against a given description file.

METHODS

new

$e = new ({
   TEXT    => 'string parsed by HTML::Tree',
   THISURL => 'URL of the text',
   DESC    => 'data description'
});

extract

$e->extract returns an array of hashes. You may use Data::Dumper to see it

STANDALONES

WWW::SpiTract::Extract::lookup(parsed_text, node_identifier)

WWW::SpiTract::Extract::lookup($t, "0.0.0");

It looks up the given text for the given node identifier, and returns an anonymous hash with entries "tag" and "text".

AUTHOR

xern <xern@cpan.org>

LICENSE

Released under The Artistic License.

SEE ALSO

WWW::SpiTract, WWW::SpiTract::Spider, HTML::TreeBuilder