NAME

Scrappy::Manual - How Do I Command The All Powerful Web Scraper Scrappy?

VERSION

version 0.6

DISCLAIMER

This documentation is incomplete, obviously. For help and support find alnewkirk or alnewkirk|com on IRC, or find this project on GitHub. If all else fails, write your local congressman.

WHAT IS SCRAPPY

Scrappy is an easy (and hopefully fun) way of scraping, spidering, and/or harvesting information from web pages. Internally Scrappy uses the awesome Web::Scraper and WWW::Mechanize modules so as such Scrappy imports its awesomeness. Scrappy is inspired by the fun and easy-to-use Dancer API. Beyond being a pretty API for WWW::Mechanize::Plugin::Web::Scraper, Scrappy also has its own featuer-set which makes web scraping easier and more enjoyable.

Scrappy (pronounced Scrap+Pee) == 'Scraper Happy' or 'Happy Scraper'; If you like you may call it Scrapy (pronounced Scrape+Pee) although Python has a web scraping framework by that name and this module is not a port of that one.

BASIC USAGE

#!/usr/bin/perl
use Scrappy qw/:syntax/;
    
# get page from URL
get 'http://search.cpan.org/recent';

if (loaded) {
    var modules => grab '#cpansearch li a', { name => 'TEXT', link => '@href' };
}

# the list function deferences, list == @{...}
print $_->{name}, "\n" for list var->{modules};

ADVANCED USAGE

Scrape From A Website

get $website_url;
var foo => grab 'a.more_info', 'ALL';

Scrape From A File

use URI;
get URI->new($filename);
var foo => grab 'div', 'ALL';

Scrape An Entire Website

crawl $starting_url, {
    'a' => sub { queue shift->href },
    '/*' => sub {
        # /* matches the root node, you can also use body, div.container, etc
        # do something
    }
};

AUTHOR

Al Newkirk <awncorp@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2010 by awncorp.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.