NAME
Web::Scraper::Config - Run Web::Scraper From Config Files
SYNOPSIS
---
scraper:
- process:
- td>ul>li
- trailers[]
- scraper:
- process_first:
- li>b
- title
- TEXT
- process_first:
- ul>li>a[href]
- url
- @href
- process:
- ul>li>ul>li>a
- movies[]
- __callback(process_movie)__
my $scraper = Web::Scraper::Config->new(
$config,
{
callbacks => {
process_movie => sub {
my $elem = shift;
return {
text => $elem->as_text,
href => $elem->attr('href')
}
}
}
}
);
$scraper->scrape($uri);
DESCRIPTION
Web::Scraper::Config allows you to harness the power of Web::Scraper from a config file.
The config files can be written in any format that Config::Any understands, as long as it conforms to this module's rules.
METHODS
new
Creates a new Web::Scraper::Config instance.
The first arguments is either a hashref that represents a config, or a filename to the config. The config file can be in any format that Config::Any understands as long as it returns a hash that's conformant to the Web::Scraper::Config rules.
The second argument (options) is optional, and is currently only used to provider callbacks to be called from the scraper. When Web::Scraper::Config encounters an element in the form of:
__callback(function_name)__
then that is replaced by the corresponding callback specified in the options hash.
scrape
Starts scraping. The semantics are exactly the same as Web::Scraper::scrape
AUTHOR
Daisuke Maki <daisuke@endeworks.jp>
LICENSE
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html