NAME

WWW::Crawl::Chromium - Crawl JavaScript-rendered pages with Chromium

VERSION

This documentation refers to WWW::Crawl::Chromium version 0.2.

SYNOPSIS

use WWW::Crawl::Chromium;

my $crawler = WWW::Crawl::Chromium->new(
    chromium_path    => '/usr/bin/chromium',
    chromium_timeout => 30,
);

my @visited = $crawler->crawl('https://example.com', \&process_page);

sub process_page {
    my $url = shift;
    print "Visited: $url\n";
}

DESCRIPTION

WWW::Crawl::Chromium reuses the crawling and link-parsing logic from WWW::Crawl but overrides page fetching to use a headless Chromium or Chrome executable. The rendered DOM is collected via --dump-dom after the document is fully loaded.

OPTIONS

  • chromium_path: Full path to the Chromium or Chrome executable. Defaults to chromium.

  • chrome_path: Alias for chromium_path.

  • chromium_timeout: Timeout in seconds for a single page fetch. Defaults to 30 seconds.

  • chromium_time_budget: Virtual time budget in milliseconds for Chromium to allow JavaScript to settle. Defaults to 10000.

METHODS

This module overrides the protected _fetch_page($url) method from WWW::Crawl to return rendered HTML from Chromium. All crawling and parsing is handled by the parent module.

AUTHOR

Ian Boddison, <bod at cpan.org>

BUGS

Please report any bugs or feature requests to bug-www-crawl at rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Crawl-Chromium. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc WWW::Crawl::Chromium

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

This software is Copyright (c) 2023-2026 by Ian Boddison.

This program is released under the following license:

Perl