NAME
Scrappy::Examples - How Do I Command The All Powerful Web Scraper Scrappy?
VERSION
version 0.592
WHAT IS SCRAPPY
Scrappy is an easy (and hopefully fun) way of scraping, spidering, and/or harvesting information from web pages. Internally Scrappy uses the awesome Web::Scraper and WWW::Mechanize modules so as such Scrappy imports its awesomeness. Scrappy is inspired by the fun and easy-to-use Dancer API. Beyond being a pretty API for WWW::Mechanize::Plugin::Web::Scraper, Scrappy also has its own featuer-set which makes web scraping easier and more enjoyable.
Scrappy (pronounced Scrap+Pee) == 'Scraper Happy' or 'Happy Scraper'; If you like you may call it Scrapy (pronounced Scrape+Pee) although Python has a web scraping framework by that name and this module is not a port of that one.
BASIC USAGE
#!/usr/bin/perl
use Scrappy qw/:syntax/;
# get page from URL
get 'http://search.cpan.org/recent';
if (loaded) {
var modules => grab '#cpansearch li a', { name => 'TEXT', link => '@href' };
}
# the list function deferences, list == @{...}
print $_->{name}, "\n" for list var->{modules};
ADVANCED USAGE
Scrape From A Website
get $website_url;
var foo => grab 'a.more_info', 'ALL';
Scrape From A File
use URI;
get URI->new($filename);
var foo => grab 'div', 'ALL';
Scrape An Entire Website
crawl $starting_url, {
'a' => sub { queue shift->href },
'/*' => sub {
# /* matches the root node, you can also use body, div.container, etc
# do something
}
};
DISCLAIMER
This documentation is incomplete, obviously. For help and support find alnewkirk or alnewkirk|com on IRC, or find this project on GitHub. If all else fails, write your local congressman.
AUTHOR
Al Newkirk <awncorp@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2010 by awncorp.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.