NAME

Sport::Analytics::NHL::Scraper - Scrape and crawl the NHL website for data

SYNOPSIS

Scrape and crawl the NHL website for data

use Sport::Analytics::NHL::Scraper
my $schedules = crawl_schedule({
  start_season => 2016,
  stop_season  => 2017
});
...
my $contents = crawl_game(
  { season => 2011, stage => 2, season_id => 0001 }, # game 2011020001 in NHL accounting
  { game_files => [qw(BS PL)], retries => 2 },
);

IMPORTANT VARIABLE

Variable @GAME_FILES contains specific definitions for the report types. Right now only the boxscore javascript has any meaningful non-default definitions; the PB feed seems to have become unavailable.

FUNCTIONS

scrape
A wrapper around the LWP::Simple::get() call for retrying and control.
Arguments: hash reference containing
  * url => URL to access
  * retries => Number of retries
  * validate => sub reference to validate the download
Returns: the content if both download and validation are successful
         undef otherwise.
crawl_schedule

Crawls the NHL schedule. The schedule is accessed through a minimalistic live api first (only works for post-2010 seasons), then through the general /api/

Arguments: hash reference containing
 * start_season => the first season to crawl
 * stop_season  => the last season to crawl
Returns: hash reference of seasonal schedules where seasons are the keys, and decoded JSONs are the values.
get_game_url_args
Sets the arguments to populate the game URL for a given report type and game
Arguments: document name, currently one of qw(BS PB RO ES GS PL)
           game hashref containing
           * season    => YYYY
           * stage     => 2|3
           * season ID => NNNN
Returns: a configured list of arguments for the URL.
crawl_game
Crawls the data for the given game
Arguments: game data as hashref:
           * season    => YYYY
           * stage     => 2|3
           * season ID => NNNN
           options hashref:
           * game_files => hashref of types of reports that are requested
           * force      => 0|1 force overwrite of files already present in the system
           * retries    => N number of the retries for every get call

AUTHOR

More Hockey Stats, <contact at morehockeystats.com>

BUGS

Please report any bugs or feature requests to contact at morehockeystats.com, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sport::Analytics::NHL::Scraper. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Sport::Analytics::NHL::Scraper

You can also look for information at: