NAME
WWW::ShopBot::Driver - Basic class for shopbot drivers
DESCRIPTION
WWW::ShopBot::Driver, which comes with multiple drivers for various merchants' sites, is a co-module released for WWW::ShopBot. When you need to grab information on certain sites, invoke WWW::ShopBot with drivers and the bot will automatically retrieve data.
There are some things to be noted for driver development.
Since there are innumerable shops online, it is important to have a clear hierarchy for drivers. Each driver must be under WWW::ShopBot the namespace, and be organized according to country code in domain name or company's residence.
For example,
WWW::ShopBot::TW::yahoo WWW::ShopBot::UK::ebay
If domain name doesn't contain country's information, use the last piece instead. For example,
WWW::ShopBot::COM::ebay WWW::ShopBot::COM::froogle
Also because there are lots of EC sites, WWW::ShopBot::Driver is isolated from WWW::ShopBot to prevent from rapid-growing version number.
Every driver inherits from WWW::ShopBot::Driver, and every query request will pass to
query
in every driver. So, please remember to letquery
be the retrieving subroutine. Or more simply, you can use the accompanyingshopbot.pl
to generate a driver template.You can use various modules to get data and extract them. Shopbot the module (does|can) not confine you to any fixed way. There are many modules for dealing with this; you can try them all.
Driver.pm also defines three common method ready for inheritance:
nextextor
,linkextor
, andspecextor
, which you can use inquery
.nextextor is defined for grabbing links to next pages
$pkg->nextextor(\$content, \%next, qr/pattern here/);
nextextor
use HTML::LinkExtractor to extract links in a page. Any link matches the given pattern is stored in %nextlinkextor is defined for grabbing links to pages of products' details
$pkg->linkextor(\$content, \%next, qr/pattern here/);
Same as
nextextor
, any link matches the given pattern is stored in %linksspecextor is defined for analyzing pages of products' details
$pkg->specextor(\$content, $item, { product => qr'<a href="blah">(.+?)</a>', price => qr'<b>(.+?)</b>', photo => qr'<img src="(.+?)">', });
It extracts the data that all match the given criteria and store them in $item.
If you want to use a driver which is not distributed with the module, please be sure that your driver, say
TW::Buzz.pm
, dwells in ${one of you @INC path}/WWW/ShopBot/TW/Buzz.pm
CAVEAT
The drivers are far from perfection; you should edit the code for any specific or advanced use.
Of course, contributions are always welcomed.
SEE ALSO
HTML::TableExtract, HTML::TableExtractor, HTML::TableParser, HTML::TableContentParser
HTML::LinkExtractor, HTML::LinkExtor, HTML::SimpleLinkExtor
HTML::Parser, HTML::TokeParser, HTML::SimpleParse,
COPYRIGHT
xern <xern@cpan.org>
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.