NAME

WWW::ContentRetrieval::Extract - Content Extractor

SYNOPSIS

 use WWW::ContentRetrieval::Extract;

 $e = WWW::ContentRetrieval::Extract->new({
     TEXT    => $t,                      # webpage text
     DESC    => $desc->{foo},            # site foo
     THISURL => 'http://bazz.buzz.org/', # url of TEXT
 });

 print Dumper $e->extract;

DESCRIPTION

WWW::ContentRetrieval::Extract extracts data according to a given description file.

METHODS

new

$e = new ({
   TEXT    => page's content,
   THISURL => URL of the text,
   DESC    => data description
});

extract

$e->extract returns an array of hashes. You may use Data::Dumper to see it

STANDALONES

WWW::ContentRetrieval::Extract::lookup( text, node_identifier )

WWW::ContentRetrieval::Extract::lookup( WWW::ContentRetrieval::bldTree($t), "0.0.0");

It looks up the given text for the some node identifier, and returns an anonymous hash with entries "tag" and "text".

SEE ALSO

WWW::ContentRetrieval, WWW::ContentRetrieval::Spider, HTML::TreeBuilder

COPYRIGHT

xern <xern@cpan.org>

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.