NAME
WWW::RobotRules::Parser::MultiValue - Parse robots.txt
SYNOPSIS
my
$robots_txt
= get
$url
;
my
$rules
= WWW::RobotRules::Parser::MultiValue->new(
agent
=>
'TestBot/1.0'
,
);
$rules
->parse(
$url
,
$robots_txt
);
sleep
$delay
;
...
}
my
@list_of_allowed_paths
=
$hash
->get_all(
'allow'
);
my
@list_of_custom_rule_value
=
$hash
->get_all(
'some-rule'
);
DESCRIPTION
WWW::RobotRules::Parser::MultiValue
is a parser for robots.txt
.
Parsed rules for the specified user agent is stored as a Hash::MultiValue, where the key is a lower case rule name.
Request-rate
rule is handled specially. It is normalized to Crawl-delay
rule.
METHODS
- new
-
$rules
= WWW::RobotRules::Parser::MultiValue->new(
aget
=>
$user_agent
);
$rules
= WWW::RobotRules::Parser::MultiValue->new(
aget
=>
$user_agent
,
ignore_default
=> 1,
);
Creates a new object to handle rules in
robots.txt
. The object parses rules match with$user_agent
. The rules ofUser-agent: *
always match and have a lower precedence than the rules explicitly matched with$user_agent
. Ifignore_default
option is specified, rules ofUser-agent: *
are simply ignored. - parse
-
$rules
->parse(
$uri
,
$text
);
Parses a text content
$text
whose URI is$uri
. - match_ua
-
$rules
->match_ua(
$pattern
);
Test if the user agent matches with
$pattern
. - rules_for
-
$hash
=
$rules
->rules_for(
$uri
);
Returns a
Hash::MultiValue
, which describes the rules of the domain of$uri
. - allows
-
$test
=
$rules
->allows(
$uri
);
Tests if the user agent is allowed to visit
$uri
. If there is 'Allow' rule for the path of$uri
, then the$uri
is allowed to visit. If there is 'Disallow' rule for the path of$uri
, then the$uri
is not allowed to visit. Otherwise, the$uri
is allowed to visit. - delay_for
-
$delay
=
$rules
->delay_for(
$uri
);
$delay_in_milliseconds
=
$rules
->delay_for(
$uri
, 1000);
Calculate a crawl delay for the specified
$uri
. The value is determined by 'Crawl-delay' rule or 'Request-rate' rule. The second argument specifies the base of the return value.
SEE ALSO
LICENSE
Copyright (C) INA Lintaro
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
AUTHOR
INA Lintaro <tarao.gnn@gmail.com>