NAME

WWW::ShopBot::Driver - Basic class for shopbot drivers

DESCRIPTION

WWW::ShopBot::Driver, which comes with multiple drivers for various merchants' sites, is a co-module released for WWW::ShopBot. When you need to grab information on certain sites, invoke WWW::ShopBot with drivers and the bot will automatically retrieve data.

There are some things to be noted for driver development.

  • Since there are innumerable shops online, it is important to have a clear hierarchy for drivers. Each driver must be under WWW::ShopBot the namespace, and be organized according to country code in domain name or company's residence.

    For example,

    WWW::ShopBot::TW::yahoo
    
    WWW::ShopBot::UK::ebay

    If domain name doesn't contain country's information, use the last piece instead. For example,

    WWW::ShopBot::COM::ebay
    
    WWW::ShopBot::COM::froogle
  • Also because there are lots of EC sites, WWW::ShopBot::Driver is isolated from WWW::ShopBot to prevent from rapid-growing version number.

  • Every driver inherits from WWW::ShopBot::Driver, and every query request will pass to query in every driver. So, please remember to let query be the retrieving subroutine. Or more simply, you can use the accompanying shopbot.pl to generate a driver template.

    You can use various modules to get data and extract them. Shopbot the module (does|can) not confine you to any fixed way. There are many modules for dealing with this; you can try them all.

    However, WWW::ShopBot::Driver does presume a common convention, I think, which works with lots of sites. In general, following this convention may speed up your driver development.

    See shopbot.pl

  • WWW::ShopBot::Driver defines three common method ready for inheritance: nextextor, linkextor, and specextor, which you can use in query.

    • nextextor is defined for grabbing links to next pages

      $pkg->nextextor(\$content, \%next, $next_accept, $next_discard);

      nextextor uses HTML::LinkExtractor to extract links in a page. Any link matches the given $next_accpet is stored in %next, but is discarded if it matches $next_discard

    • linkextor is defined for grabbing links to pages of products' details

      $pkg->linkextor(\$content, \%next, $link_accept, $link_discard);

      Same as nextextor, any link matches $link_accpet is stored in %links; discarded if it matches $link_discard

    • specextor is defined for analyzing pages of products' details

          $pkg->specextor(\$content, $item,
      		    {
      			product => qr'<a href="blah">(.+?)</a>',
      			price => qr'<b>(.+?)</b>',
      			photo => qr'<img src="(.+?)">',
      		    });

      It extracts the data that all match the given criteria and stores them in $item.

  • If you want to use a driver which is not distributed with the module, please be sure that your driver, say TW::Buzz.pm, dwells in ${one of you @INC path}/WWW/ShopBot/TW/Buzz.pm

CAVEAT

The drivers are far from perfection; you should edit the code for any specific or advanced use.

Of course, contributions are always welcomed.

SEE ALSO

WWW::ShopBot

HTML::TableExtract, HTML::TableExtractor, HTML::TableParser, HTML::TableContentParser

HTML::LinkExtractor, HTML::LinkExtor, HTML::SimpleLinkExtor

HTML::Parser, HTML::TokeParser, HTML::SimpleParse,

COPYRIGHT

xern <xern@cpan.org>

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.