The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

WWW::GoKGS - KGS Go Server (http://www.gokgs.com/) Scraper

SYNOPSIS

  use WWW::GoKGS;

  my $gokgs = WWW::GoKGS->new(
      from => 'user@example.com'
  );

  # Game archives
  my $game_archives_1 = $gokgs->scrape( '/gameArchives.jsp?user=foo' );
  my $game_archives_2 = $gokgs->game_archives->query( user => 'foo' );

  # Top 100 players
  my $top_100_1 = $gokgs->scrape( '/top100.jsp' );
  my $top_100_2 = $gokgs->top_100->query;

  # List of tournaments 
  my $tourn_list_1 = $gokgs->scrape( '/tournList.jsp?year=2014' );
  my $tourn_list_2 = $gokgs->tourn_list->query( year => 2014 );

  # Information for the tournament
  my $tourn_info_1 = $gokgs->scrape( '/tournInfo.jsp?id=123' );
  my $tourn_info_2 = $gokgs->tourn_info->query( id => 123 );

  # The tournament entrants
  my $tourn_entrants_1 = $gokgs->scrape( '/tournEntrans.jsp?id=123&sort=n' );
  my $tourn_entrants_2 = $gokgs->tourn_entrants->query( id => 123, sort => 'n' );

  # The tournament games
  my $tourn_games_1 = $gokgs->scrape( '/tournGames.jsp?id=123&round=1' );
  my $tourn_games_2 = $gokgs->tourn_games->query( id => 123, round => 1 );

DESCRIPTION

This module is a KGS Go Server (http://www.gokgs.com/) scraper. KGS allows the users to play a board game called go a.k.a. baduk (Korean) or weiqi (Chinese). Although the web server provides resources generated dynamically, such as Game Archives, they are formatted as HTML, the only format. This module provides yet another representation of those resources, Perl data structure.

This class maps a URI preceded by http://www.gokgs.com/ to a proper scraper. The supported resources on KGS are as follows:

KGS Game Archives (http://www.gokgs.com/archives.jsp)

Handled by WWW::GoKGS::Scraper::GameArchives.

Top 100 KGS Players (http://www.gokgs.com/top100.jsp)

Handled by WWW::GoKGS::Scraper::Top100.

KGS Tournaments (http://www.gokgs.com/tournList.jsp)

Handled by WWW::GoKGS::Scraper::TournList, WWW::GoKGS::Scraper::TournInfo, WWW::GoKGS::Scraper::TournEntrants and WWW::GoKGS::Scraper::TournGames.

ATTRIBUTES

$UserAgent = $gokgs->user_agent
$gokgs->user_agent( LWP::RoboUA->new(...) )

Can be used to get or set a user agent object which is used to GET the requested resource. Defaults to LWP::RobotUA object which consults http://www.gokgs.com/robots.txt before sending HTTP requests, and also sets a proper delay between requests.

NOTE: LWP::RobotUA fails to read /robots.txt since the KGS web server doesn't returns the Content-Type response header as of June 23rd, 2014. This module can not solve this problem.

You can also set your own user agent object as follows:

  use LWP::UserAgent;

  my $gokgs = WWW::GoKGS->new(
      user_agent => LWP::UserAgent->new(
          agent => 'MyAgent/1.00'
      )
  );

NOTE: You should set a delay between requests to avoid overloading the KGS server.

$email_address = $gokgs->from
$gokgs->from( 'user@example.com' )

Can be used to get or set your email address which is used by $gokgs->user_agent to send the From request header that indicates who is making the request. This attribute must be defined when you use LWP::RobotUA.

  my $gokgs = WWW::GoKGS->new(
      from => 'user@example.com'
  );
$product_id = $gokgs->agent
$gokgs->agent( 'MyAgent/0.01' )

Can be used to get or set the product token that is used by $gokgs->user_agent to send the User-Agent request header. Defaults to WWW::GoKGS/#.##, where #.## is substituted with the version number of this module.

$CodeRef = $gokgs->html_filter
$gokgs->html_filter( sub { my $html = shift; ... } )

Can be used to get or set an HTML filter. Defaults to an anonymous subref which just returns the given argument (sub { $_[0] }). The callback is called with an HTML string. The return value is used as the filtered value.

  $gokgs->html_filter(sub {
      my $html = shift;
      $html =~ s/<.*?>//g; # strip HTML tags
      $html;
  });
$CodeRef = $gokgs->date_filter
$gokgs->date_filter( sub { my $date = shift; ... } )

Can be used to get or set a date filter. Defaults to an anonymous subref which just returns the given argument (sub { $_[0] }). The callback is called with a date string such as 2014-05-17T19:05Z. The return value is used as the filtered value.

  use Time::Piece qw/gmtime/;

  $gokgs->date_filter(sub {
      my $date = shift; # => "2014-05-17T19:05Z"
      gmtime->strptime( $date, '%Y-%m-%dT%H:%MZ' );
  });
$GameArchives = $gokgs->game_archives
$gokgs->game_archives( WWW::GoKGS::Scraper::GameArchives->new(...) )

Can be used to get or set a scraper object which can scrape /gameArchives.jsp. Defaults to a WWW::GoKGS::Scraper::GameArchives object.

$Top100 = $gokgs->top_100
$gokgs->top_100( WWW::GoKGS::Scraper::Top100->new(...) )

Can be used to get or set a scraper object which can scrape /top100.jsp. Defaults to a WWW::GoKGS::Scraper::Top100 object.

$TournList = $gokgs->tourn_list
$gokgs->tourn_list( WWW::GoKGS::Scraper::TournList->new(...) )

Can be used to get or set a scraper object which can scrape /tournList.jsp. Defaults to a WWW::GoKGS::Scraper::TournList object.

$TournInfo = $gokgs->tourn_info
$gokgs->tourn_info( WWW::GoKGS::Scraper::TournInfo->new(...) )

Can be used to get or set a scraper object which can scrape /tournInfo.jsp. Defaults to a WWW::GoKGS::Scraper::TournInfo object.

$TournEntrants = $gokgs->tourn_entrants
$gokgs->tourn_entrants( WWW::GoKGS::Scraper::TournEntrants->new(...) )

Can be used to get or set a scraper object which can scrape /tournEntrants.jsp. Defaults to a WWW::GoKGS::Scraper::TournEntrants object.

$TournGames = $gokgs->tourn_games
$gokgs->tourn_games( WWW::GoKGS::Scraper::TournGames->new(...) )

Can be used to get or set a scraper object which can scrape /tournGames.jsp. Defaults to a WWW::GoKGS::Scraper::TournGames object.

INSTANCE METHODS

$HashRef = $gokgs->scrape( '/gameArchives.jsp?user=foo' )
$HashRef = $gokgs->scrape( 'http://www.gokgs.com/gameArchives.jsp?user=foo' )

A shortcut for:

  my $uri = URI->new( 'http://www.gokgs.com/gameArchives.jsp?user=foo' );
  my $game_archives = $gokgs->game_archives->scrape( $uri );

See WWW::GoKGS::Scraper::GameArchives for details.

$HashRef = $gokgs->scrape( '/top100.jsp' )
$HashRef = $gokgs->scrape( 'http://www.gokgs.com/top100.jsp' )

A shortcut for:

  my $uri = URI->new( 'http://www.gokgs.com/top100.jsp' );
  my $top_100 = $gokgs->top_100->scrape( $uri );

See WWW::GoKGS::Scraper::Top100 for details.

$HashRef = $gokgs->scrape( '/tournList.jsp?year=2014' )
$HashRef = $gokgs->scrape( 'http://www.gokgs.com/tournList.jsp?year=2014' )

A shortcut for:

  my $uri = URI->new( 'http://www.gokgs.com/tournList.jsp?year=2014' );
  my $tourn_list = $gokgs->tourn_list->scrape( $uri );

See WWW::GoKGS::Scraper::TournList for details.

$HashRef = $gokgs->scrape( '/tournInfo.jsp?id=123' )
$HashRef = $gokgs->scrape( 'http://www.gokgs.com/tournInfo.jsp?id=123' )

A shortcut for:

  my $uri = URI->new( 'http://www.gokgs.com/tournInfo.jsp?id=123' );
  my $tourn_info = $gokgs->tourn_info->scrape( $uri );

See WWW::GoKGS::Scraper::TournInfo for details.

$HashRef = $gokgs->scrape( '/tournEntrants.jsp?id=123&s=n' )
$HashRef = $gokgs->scrape( 'http://www.gokgs.com/tournEntrants.jsp?id=123&s=n' )

A shortcut for:

  my $uri = URI->new( 'http://www.gokgs.com/tournEntrants.jsp?id=123&s=n' );
  my $tourn_entrants = $gokgs->tourn_entrants->scrape( $uri );

See WWW::GoKGS::Scraper::TournEntrants for details.

$HashRef = $gokgs->scrape( '/tournGames.jsp?id=123&round=1' )
$HashRef = $gokgs->scrape( 'http://www.gokgs.com/tournGames.jsp?id=123&round=1' )

A shortcut for:

  my $uri = URI->new( 'http://www.gokgs.com/tournGames.jsp?id=123&round=1' );
  my $tourn_games = $gokgs->tourn_games->scrape( $uri );

See WWW::GoKGS::Scraper::TournGames for details.

$scraper = $gokgs->get_scraper( $path )

Returns a scraper object which can scrape a resource located at $path on KGS. If the scraper object does not exist, then undef is returned.

  my $game_archives = $gokgs->get_scraper( '/gameArchives.jsp' );
  # => WWW::GoKGS::Scraper::GameArchives object
$gokgs->set_scraper( $path => $scraper )
$gokgs->set_scraper( $p1 => $s1, $p2 => $s2, ... )

Can be used to set a scraper object which can scrape a resource located at $path on KGS. You can also set multiple scrapers in one set_scraper call.

  use Web::Scraper;
  use WWW::GoKGS::Scraper::FooBar; # isa WWW::GoKGS::Scraper

  $gokgs->set_scraper(
      '/fooBar.jsp' => WWW::GoKGS::Scraper::FooBar->new,
      '/barBaz.jsp' => scraper {
           process '.bar', baz => 'TEXT';
           ...
      }
  );

CLASS METHODS

$class->mk_accessors( $path )
$class->mk_accessors( @paths )

Creates the accessor method for a scraper which can scrape $path. You can also create multiple accessors in one mk_accessors call.

  use parent 'WWW::GoKGS';

  # Generates foo_bar() whose builder is _build_foo_bar()
  __PACKAGE__->mk_accessors( '/fooBar.jsp' );

  # Build a scraper object which can scrape /fooBar.jsp
  sub _build_foo_bar {
      my $self = shift;
      ...
  }
$CodeRef = $class->make_accessor( $path )

Returns a subroutine reference which acts as an accessor for the scraper which can scrape $path.

$accessor_name = $class->accessor_name_for( $path )

Returns the accessor name of a scraper which can scrape $path.

  my $accessor_name = $class->accessor_name_for( '/fooBar.jsp' );
  # => "foo_bar"
$builder_name = $class->builder_name_for( $path )

Returns the builder name of a scraper which can scrape $path.

  my $builder_name = $class->builder_name_for( '/fooBar.jsp' );
  # => "_build_foo_bar"

WRITING SCRAPERS

KGS scrapers should use a namespace which starts with WWW::GoKGS::Scraper::, and also should be a subclass of WWW::GoKGS::Scraper so that the users can not only use the module solely, but also can add the scraper object to WWW::GoKGS object as follows:

  use WWW::GoKGS::Scraper::FooBar; # your scraper

  # using set_scraper()
  $gokgs->set_scraper(
      '/fooBar.jsp' => WWW::GoKGS::Scraper::FooBar->new
  );

  # by subclassing
  use parent 'WWW::GoKGS';
  __PACKAGE__->mk_accessors( '/fooBar.jsp' );
  sub _build_foo_bar { WWW::GoKGS::Scraper::FooBar->new }

ENVIRONMENTAL VARIABLES

AUTHOR_TESTING

Some tests for scrapers send HTTP requests to GET resources on KGS. When you run ./Build test, they are skipped by default to avoid overloading the KGS server. To run those tests, you have to set AUTHOR_TESTING to true explicitly:

  $ perl Build.PL
  $ env AUTHOR_TESTING=1 ./Build test

Author tests are run by Travis CI once a day. You can visit the website to check whether the tests passed or not.

ACKNOWLEDGEMENT

Thanks to wms, the author of KGS Go Server, we can enjoy playing go online for free.

SEE ALSO

KGS Go Server, Web::Scraper

AUTHOR

Ryo Anazawa (anazawa@cpan.org)

LICENSE

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.