Build Status

NAME

WWW::RobotRules::Parser::MultiValue - Parse robots.txt

SYNOPSIS

use WWW::RobotRules::Parser::MultiValue;
use LWP::Simple qw(get);

my $url = 'http://example.com/robots.txt';
my $robots_txt = get $url;

my $rules = WWW::RobotRules::Parser::MultiValue->new(
    agent => 'TestBot/1.0',
);
$rules->parse($url, $robots_txt);

if ($rules->allows('http://example.com/some/path')) {
    my $delay = $rules->delay_for('http://example.com/');
    sleep $delay;
    ...
}

my $hash = $rules->rules_for('http://example.com/');
my @list_of_allowed_paths = $hash->get_all('allow');
my @list_of_custom_rule_value = $hash->get_all('some-rule');

DESCRIPTION

WWW::RobotRules::Parser::MultiValue is a parser for robots.txt.

Parsed rules for the specified user agent is stored as a Hash::MultiValue, where the key is a lower case rule name.

Request-rate rule is handled specially. It is normalized to Crawl-delay rule.

METHODS

SEE ALSO

Hash::MultiValue

LICENSE

Copyright (C) INA Lintaro

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

INA Lintaro tarao.gnn@gmail.com