NAME

Web::Scraper::Config - Run Web::Scraper From Config Files

SYNOPSIS

---
scraper:
  - process:
    - td>ul>li
    - trailers[]
    - scraper:
      - process_first:
        - li>b
        - title
        -  TEXT
      - process_first:
        - ul>li>a[href]
        - url
        - @href
      - process:
        - ul>li>ul>li>a
        - movies[]
        - __callback(process_movie)__


my $scraper = Web::Scraper::Config->new(
  $config,
  {
    callbacks => {
      process_movie => sub {
        my $elem = shift;
        return {
          text => $elem->as_text,
          href => $elem->attr('href')
        }
      }
   }
 }
);
$scraper->scrape($uri);

DESCRIPTION

Web::Scraper::Config allows you to harness the power of Web::Scraper from a config file.

The config files can be written in any format that Config::Any understands, as long as it conforms to this module's rules.

METHODS

new

Creates a new Web::Scraper::Config instance.

The first arguments is either a hashref that represents a config, or a filename to the config. The config file can be in any format that Config::Any understands as long as it returns a hash that's conformant to the Web::Scraper::Config rules.

The second argument (options) is optional, and is currently only used to provider callbacks to be called from the scraper. When Web::Scraper::Config encounters an element in the form of:

__callback(function_name)__

then that is replaced by the corresponding callback specified in the options hash.

scrape

Starts scraping. The semantics are exactly the same as Web::Scraper::scrape

AUTHOR

Daisuke Maki <daisuke@endeworks.jp>

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html