NAME
WWW::Crawl::Chromium - Crawl JavaScript-rendered pages with Chromium
VERSION
This documentation refers to WWW::Crawl::Chromium version 0.2.
SYNOPSIS
use WWW::Crawl::Chromium;
my $crawler = WWW::Crawl::Chromium->new(
chromium_path => '/usr/bin/chromium',
chromium_timeout => 30,
);
my @visited = $crawler->crawl('https://example.com', \&process_page);
sub process_page {
my $url = shift;
print "Visited: $url\n";
}
DESCRIPTION
WWW::Crawl::Chromium reuses the crawling and link-parsing logic from WWW::Crawl but overrides page fetching to use a headless Chromium or Chrome executable. The rendered DOM is collected via --dump-dom after the document is fully loaded.
OPTIONS
chromium_path: Full path to the Chromium or Chrome executable. Defaults tochromium.chrome_path: Alias forchromium_path.chromium_timeout: Timeout in seconds for a single page fetch. Defaults to 30 seconds.chromium_time_budget: Virtual time budget in milliseconds for Chromium to allow JavaScript to settle. Defaults to 10000.
METHODS
This module overrides the protected _fetch_page($url) method from WWW::Crawl to return rendered HTML from Chromium. All crawling and parsing is handled by the parent module.
AUTHOR
Ian Boddison, <bod at cpan.org>
BUGS
Please report any bugs or feature requests to bug-www-crawl at rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Crawl-Chromium. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc WWW::Crawl::Chromium
You can also look for information at:
GitHub
RT: CPAN's request tracker (report bugs here)
https://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Crawl-Chromium
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
This software is Copyright (c) 2023-2026 by Ian Boddison.
This program is released under the following license:
Perl