NAME
WWW::ShopBot::Driver - Basic class for shopbot drivers
DESCRIPTION
WWW::ShopBot::Driver, which comes with multiple drivers for various merchants' sites, is a co-module released for WWW::ShopBot. When you need to grab information on certain sites, invoke WWW::ShopBot with drivers and the bot will automatically retrieve data.
There are some things to be noted for driver development.
Since there are innumerable shops online, it is important to have a clear hierarchy for drivers. Each driver must be under WWW::ShopBot the namespace, and be organized according to country code in domain name or company's residence.
For example,
WWW::ShopBot::TW::yahoo WWW::ShopBot::UK::ebay
If domain name doesn't contain country's information, use the last piece instead. For example,
WWW::ShopBot::COM::ebay WWW::ShopBot::COM::froogle
Also because there are lots of EC sites, WWW::ShopBot::Driver is isolated from WWW::ShopBot to prevent from rapid-growing version number.
Every driver inherits from WWW::ShopBot::Driver, and every query request will pass to
query
in every driver. So, please remember to letquery
be the retrieving subroutine. Or more simply, you can use the accompanying shopbot.pl to generate a driver template.You can use various modules to get data and extract them. Shopbot the module (does|can) not confine you to any fixed way. There are many modules for dealing with this; you can try them all.
However, WWW::ShopBot::Driver does presume a common convention, I think, which works with lots of sites. In general, following this convention may speed up your driver development.
See shopbot.pl
WWW::ShopBot::Driver defines three common method ready for inheritance:
nextextor
,linkextor
, andspecextor
, which you can use inquery
.nextextor is defined for grabbing links to next pages
$pkg->nextextor(\$content, \%next, $next_accept, $next_discard);
nextextor
uses HTML::LinkExtractor to extract links in a page. Any link matches the given $next_accpet is stored in %next, but is discarded if it matches $next_discardlinkextor is defined for grabbing links to pages of products' details
$pkg->linkextor(\$content, \%next, $link_accept, $link_discard);
Same as
nextextor
, any link matches $link_accpet is stored in %links; discarded if it matches $link_discardspecextor is defined for analyzing pages of products' details
$pkg->specextor(\$content, $item, { product => qr'<a href="blah">(.+?)</a>', price => qr'<b>(.+?)</b>', photo => qr'<img src="(.+?)">', });
It extracts the data that all match the given criteria and stores them in $item.
If you want to use a driver which is not distributed with the module, please be sure that your driver, say
TW::Buzz.pm
, dwells in ${one of you @INC path}/WWW/ShopBot/TW/Buzz.pm
CAVEAT
The drivers are far from perfection; you should edit the code for any specific or advanced use.
Of course, contributions are always welcomed.
SEE ALSO
HTML::TableExtract, HTML::TableExtractor, HTML::TableParser, HTML::TableContentParser
HTML::LinkExtractor, HTML::LinkExtor, HTML::SimpleLinkExtor
HTML::Parser, HTML::TokeParser, HTML::SimpleParse,
COPYRIGHT
xern <xern@cpan.org>
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.