NAME

WWW::Crawl::Auto - Crawl pages and automatically switch between HTTP and Chromium

VERSION

This documentation refers to WWW::Crawl::Auto version 0.5.

SYNOPSIS

use WWW::Crawl::Auto;

my $crawler = WWW::Crawl::Auto->new(
    chromium_path  => '/usr/bin/chromium',
    auto_min_bytes => 512,
);

my @visited = $crawler->crawl('https://example.com', \&process_page);

sub process_page {
    my $url = shift;
    print "Visited: $url\n";
}

DESCRIPTION

WWW::Crawl::Auto uses the WWW::Crawl crawling logic but decides, per site, whether to fetch pages with HTTP::Tiny or with a headless Chromium. When a site is detected as dynamic, the crawler switches to Chromium for that authority for the rest of the crawl.

OPTIONS

force_chromium: A hostname (or arrayref of hostnames) to always fetch with Chromium.
force_http: A hostname (or arrayref of hostnames) to always fetch with HTTP::Tiny.
auto_min_bytes: Minimum response size to consider a static page. Defaults to 512.
auto_decider: Coderef invoked as auto_decider->($url, $resp, $self) to decide whether Chromium should be used. Return true to use Chromium.
retry_count: Number of times to retry Chromium fetches before giving up. Defaults to 0.
debug: Enable debug logging to STDERR when set to a true value.

CONSTRUCTOR

new(%options)

Creates a new WWW::Crawl::Auto object. Options are the same as WWW::Crawl, plus the options listed above.

METHODS

All public methods are inherited from WWW::Crawl.

AUTHOR

Ian Boddison, <bod at cpan.org>

LICENSE AND COPYRIGHT

This program is released under the following license:

Perl

To install WWW::Crawl, copy and paste the appropriate command in to your terminal.

cpanm

cpanm WWW::Crawl

CPAN shell

perl -MCPAN -e shell
install WWW::Crawl

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)