NAME

WWW::Crawl::Auto - Crawl pages and automatically switch between HTTP and Chromium

VERSION

This documentation refers to WWW::Crawl::Auto version 0.5.

SYNOPSIS

use WWW::Crawl::Auto;

my $crawler = WWW::Crawl::Auto->new(
    chromium_path  => '/usr/bin/chromium',
    auto_min_bytes => 512,
);

my @visited = $crawler->crawl('https://example.com', \&process_page);

sub process_page {
    my $url = shift;
    print "Visited: $url\n";
}

DESCRIPTION

WWW::Crawl::Auto uses the WWW::Crawl crawling logic but decides, per site, whether to fetch pages with HTTP::Tiny or with a headless Chromium. When a site is detected as dynamic, the crawler switches to Chromium for that authority for the rest of the crawl.

OPTIONS

  • force_chromium: A hostname (or arrayref of hostnames) to always fetch with Chromium.

  • force_http: A hostname (or arrayref of hostnames) to always fetch with HTTP::Tiny.

  • auto_min_bytes: Minimum response size to consider a static page. Defaults to 512.

  • auto_decider: Coderef invoked as auto_decider->($url, $resp, $self) to decide whether Chromium should be used. Return true to use Chromium.

  • retry_count: Number of times to retry Chromium fetches before giving up. Defaults to 0.

  • debug: Enable debug logging to STDERR when set to a true value.

CONSTRUCTOR

new(%options)

Creates a new WWW::Crawl::Auto object. Options are the same as WWW::Crawl, plus the options listed above.

METHODS

All public methods are inherited from WWW::Crawl.

AUTHOR

Ian Boddison, <bod at cpan.org>

LICENSE AND COPYRIGHT

This software is Copyright (c) 2023-2026 by Ian Boddison.

This program is released under the following license:

Perl