NAME
Sport::Analytics::NHL - Crawl data from NHL.com and put it into a database
VERSION
Version 1.00
SYNOPSIS
Crawl data from NHL.com and put it into a database.
Crawls the NHL.com website, processes the game reports and stores them into a Mongo database or into the filesystem.
use Sport::Analytics::NHL;
my $nhl = Sport::Analytics::NHL->new();
$nhl->scrape_games();
...
# more functionality to be added in later releases.
EXPORT
hdb_version() - report the version. All the other interface is OOP via the new() constructor.
METHODS
hdb_version
-
Returns the current version of the package
new
-
Returns a new Sport::Analytics::NHL object. If a Mongo DB is configured, the connection to the database is established, and the handle is stored in the object.
parse_game_args
-
Parses various game arguments to the scrape_games() method: * NHL IDs of format SSSS0TIIII (2016020201) * Our IDs of format SSSSTIIII (201620201) * Dates in format YYYYMMDD (20160202) where S stands for starting year of season, T - stage (2 - regular, 3 - playoffs), I - the ID of the game within the year.
Modifies the games array reference passed as the first argument, and dates array reference passed as the second argument, using the list of number strings as the remaining list of arguments.
get_crawled_games_for_dates
-
Gets a list of already crawled games on given list of dates. Crawls the season schedule on the NHL website if necessary. Arguments: the options to pass to the scraper that crawls and the list of the dates. Returns: the list of game structures which are hash references with the following fields: * season * stage * season id * Our game ID (see the previous section)
get_nodb_scheduled_games
-
Gets a list of scheduled, uncrawled games in the filesystem, based on the schedules already stored in, or crawled into the system. Argument: options hashref that specifies whether new schedules should be crawled, and only specific stage should be filtered. Returns: the list of game structures which are hash references with the following fields: * season * stage * season id * Our game ID (see the previous section)
get_db_scheduled_games
-
Same as the previous method, but the information is extracted from the Mongo database rather than the filesystem.
get_scheduled_games
-
The generic wrapper for the two previous methods.
scrape_games
-
Scrape the games reports from the NHL website and store them in files on the disk. Arguments: the hashref of options for the scrape - * no_schedule_crawl - whether fresh schedule should be crawled * start_season - the first season to start scraping from (default 1917) * stop_season - the last season to scrape (default - ongoing) * stage - 2 for Regular, 3 for Playoffs, none for both (default - none) * force - override the already present files and data
AUTHOR
More Hockey Stats, <contact at morehockeystats.com>
BUGS
Please report any bugs or feature requests to contact at morehockeystats.com
, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sport::Analytics::NHL. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Sport::Analytics::NHL
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
https://rt.cpan.org/NoAuth/Bugs.html?Dist=Sport::Analytics::NHL
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2018 More Hockey Stats.
This program is released under the following license: gnu