The Perl and Raku Conference 2025: Greenville, South Carolina - June 27-29 Learn more

NAME

Dezi::Bot - web crawler

SYNOPSIS

my $bot = Dezi::Bot->new(
# give your bot a name
name => 'dezibot',
# explicit object, instead of class+config
spider => $spider_object,
# every crawled URI
# passed to the $handler->handle() method
handler_class => 'Dezi::Bot::Handler',
# default
spider_class => 'Dezi::Bot::Spider',
# passed to spider_class->new()
spider_config => {
agent => 'dezibot ' . $Dezi::Bot::VERSION,
email => 'bot@dezi.org',
max_depth => 4,
},
# default
cache_class => 'Dezi::Bot::Cache',
# passed to cache_class->new()
cache_config => {
driver => 'File',
root_dir => '/tmp/dezibot',
},
# default
queue_class => 'Dezi::Bot::Queue',
# passed to queue_class->new()
queue_config => {
type => 'DBI',
dsn => "DBI:mysql:database=dezibot;host=localhost;port=3306",
username => 'myuser',
password => 'mysecret',
},
);
$bot->crawl('http://dezi.org');

DESCRIPTION

The Dezi::Bot module is a web crawler optimized for parallel use across multiple hosts.

METHODS

init( args )

Overrides the base method to set default options based on args. See the SYNOPSIS.

Options:

name
spider
handler_class
handler_config
spider_class
spider_config
cache_class
cache_config
queue_class
queue_config

crawl( urls )

Calls ->spider->crawl() for an array of urls.

Returns the total number of URIs crawled.

AUTHOR

Peter Karman, <karman at cpan.org>

BUGS

Please report any bugs or feature requests to bug-dezi-bot at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi-Bot. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Dezi::Bot

You can also look for information at:

COPYRIGHT & LICENSE

Copyright 2013 Peter Karman.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.