NAME
WWW::Crawl::Auto - Crawl pages and automatically switch between HTTP and Chromium
VERSION
This documentation refers to WWW::Crawl::Auto version 0.5.
SYNOPSIS
use WWW::Crawl::Auto;
my $crawler = WWW::Crawl::Auto->new(
chromium_path => '/usr/bin/chromium',
auto_min_bytes => 512,
);
my @visited = $crawler->crawl('https://example.com', \&process_page);
sub process_page {
my $url = shift;
print "Visited: $url\n";
}
DESCRIPTION
WWW::Crawl::Auto uses the WWW::Crawl crawling logic but decides, per site, whether to fetch pages with HTTP::Tiny or with a headless Chromium. When a site is detected as dynamic, the crawler switches to Chromium for that authority for the rest of the crawl.
OPTIONS
force_chromium: A hostname (or arrayref of hostnames) to always fetch with Chromium.force_http: A hostname (or arrayref of hostnames) to always fetch with HTTP::Tiny.auto_min_bytes: Minimum response size to consider a static page. Defaults to 512.auto_decider: Coderef invoked asauto_decider->($url, $resp, $self)to decide whether Chromium should be used. Return true to use Chromium.retry_count: Number of times to retry Chromium fetches before giving up. Defaults to 0.debug: Enable debug logging to STDERR when set to a true value.
CONSTRUCTOR
new(%options)
Creates a new WWW::Crawl::Auto object. Options are the same as WWW::Crawl, plus the options listed above.
METHODS
All public methods are inherited from WWW::Crawl.
AUTHOR
Ian Boddison, <bod at cpan.org>
LICENSE AND COPYRIGHT
This software is Copyright (c) 2023-2026 by Ian Boddison.
This program is released under the following license:
Perl