NAME
OurNet::Site - Extract web pages via templates
SYNOPSIS
use LWP::Simple;
use OurNet::Site;
my ($query, $hits) = ('autrijus', 10);
my $found;
# Create a bot
$bot = OurNet::Site->new('google');
# Parse the result got from LWP::Simple
$bot->callme($self, 0, get($bot->geturl($query, $hits)), \&callmeback);
print '*** ' . ($found ? $found : 'No') . ' match(es) found.';
# Callback routine
sub callmeback {
my ($self, $himself) = @_;
foreach my $entry (@{$himself->{response}}) {
if ($entry->{url}) {
print "*** [$entry->{title}]" .
" ($entry->{score})" .
" - [$entry->{id}]\n" .
" URL: [$entry->{url}]\n" .
" $entry->{preview}\n";
$found++;
delete($entry->{url});
}
}
}
DESCRIPTION
This module emulates a typical search engine by reading a XML script defining its aspects, and parses results on-the-fly accordingly.
Note that it also takes Inforia Quest .fmt scripts, available at http://www.inforian.com/. The author of course cannot support this usage.
As per v1.2, Site.pm also accepts Template Toolkit format templates with extention '.tt2' as site descriptors, provided that it contains at least one [% FOREACH entry %]
block, and [% SET url.start %]
accordingly.
Note that tt2 support is *highly* experimental and should not be relied upon until a more stable release comes.
BUGS
Probably lots. Most notably the 'More' facilities is lacking. Also there is no template-generating abilities. This is a must, but I couldn't find enough motivation to do it. Maybe you could.
Currently, tt2 does not (quite) support incremental parsing in conjunction with OurNet::Query.
SEE ALSO
AUTHORS
Autrijus Tang <autrijus@autrijus.org>
COPYRIGHT
Copyright 2001 by Autrijus Tang <autrijus@autrijus.org>.
All rights reserved. You can redistribute and/or modify this module under the same terms as Perl itself.