NAME

WWW::Crawl::Chromium - Crawl JavaScript-rendered pages with Chromium

VERSION

This documentation refers to WWW::Crawl::Chromium version 0.2.

SYNOPSIS

use WWW::Crawl::Chromium;

my $crawler = WWW::Crawl::Chromium->new(
    chromium_path    => '/usr/bin/chromium',
    chromium_timeout => 30,
);

my @visited = $crawler->crawl('https://example.com', \&process_page);

sub process_page {
    my $url = shift;
    print "Visited: $url\n";
}

DESCRIPTION

WWW::Crawl::Chromium reuses the crawling and link-parsing logic from WWW::Crawl but overrides page fetching to use a headless Chromium or Chrome executable. The rendered DOM is collected via --dump-dom after the document is fully loaded.

OPTIONS

chromium_path: Full path to the Chromium or Chrome executable. Defaults to chromium.
chrome_path: Alias for chromium_path.
chromium_timeout: Timeout in seconds for a single page fetch. Defaults to 30 seconds.
chromium_time_budget: Virtual time budget in milliseconds for Chromium to allow JavaScript to settle. Defaults to 10000.

METHODS

This module overrides the protected _fetch_page($url) method from WWW::Crawl to return rendered HTML from Chromium. All crawling and parsing is handled by the parent module.

AUTHOR

Ian Boddison, <bod at cpan.org>

BUGS

Please report any bugs or feature requests to bug-www-crawl at rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Crawl-Chromium. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc WWW::Crawl::Chromium

You can also look for information at:

GitHub

https://github.com/IanBod/WWW-Crawl
RT: CPAN's request tracker (report bugs here)

https://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Crawl-Chromium
Search CPAN

https://metacpan.org/release/WWW-Crawl-Chromium

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

This program is released under the following license:

Perl

To install WWW::Crawl, copy and paste the appropriate command in to your terminal.

cpanm

cpanm WWW::Crawl

CPAN shell

perl -MCPAN -e shell
install WWW::Crawl

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)