NAME

WWW::Mechanize::Chrome::URLBlacklist - blacklist URLs from fetching

SYNOPSIS

use WWW::Mechanize::Chrome;
use WWW::Mechanize::Chrome::URLBlacklist;

my $mech = WWW::Mechanize::Chrome->new();
my $bl = WWW::Mechanize::Chrome::URLBlacklist->new(
    blacklist => [
        qr!\bgoogleadservices\b!,
    ],
    whitelist => [
        qr!\bcorion\.net\b!,
    ],

    # fail all unknown URLs
    default => 'failRequest',
    # allow all unknown URLs
    # default => 'continueRequest',

    on_default => sub {
        warn "Ignored URL $_[0] (action was '$_[1]')",
    },
);
$bl->enable($mech);

DESCRIPTION

This module allows an easy approach to whitelisting/blacklisting URLs so that Chrome does not make requests to the blacklisted URLs.

ATTRIBUTES

<whitelist>

Arrayref containing regular expressions of URLs to always allow fetching.

<blacklist>

Arrayref containing regular expressions of URLs to always deny fetching unless they are matched by something in the whitelist.

<default>

default => 'continueRequest'

The action to take if an URL appears neither in the whitelist nor in the blacklist. The default is continueRequest. If you want to block all unknown URLs, use failRequest

<on_default>

on_default => sub {
    my( $url, $action ) = @_;
    warn "Unknown URL <$url>";
};

This callback is invoked for every URL that is neither in the whitelist nor in the blacklist. This is useful to see what URLs are still missing a category.

<_mech>

(internal) The WWW::Mechanize::Chrome instance we are connected to

<_request_listener>

(internal) The request listener created by WWW::Mechanize::Chrome while listening for URL messages

METHODS

->new

my $bl = WWW::Mechanize::Chrome::URLBlacklist->new(
    blacklist => [
        qr!\bgoogleadservices\b!,
        qr!\ioam\.de\b!,
        qr!\burchin\.js$!,
        qr!.*\.(?:woff|ttf)$!,
        qr!.*\.css(\?\w+)?$!,
        qr!.*\.png$!,
        qr!.*\bfavicon.ico$!,
    ],
);

->enable

$bl->enable( $mech );

Attaches the blacklist to a WWW::Mechanize::Chrome object.

->enable

$bl->disable( $mech );

Removes the blacklist to a WWW::Mechanize::Chrome object.

REPOSITORY

The public repository of this module is https://github.com/Corion/www-mechanize-chrome.

SUPPORT

The public support forum of this module is https://perlmonks.org/.

TALKS

I've given a German talk at GPW 2017, see http://act.yapc.eu/gpw2017/talk/7027 and https://corion.net/talks for the slides.

At The Perl Conference 2017 in Amsterdam, I also presented a talk, see http://act.perlconference.org/tpc-2017-amsterdam/talk/7022. The slides for the English presentation at TPCiA 2017 are at https://corion.net/talks/WWW-Mechanize-Chrome/www-mechanize-chrome.en.html.

BUG TRACKER

Please report bugs in this module via the RT CPAN bug queue at https://rt.cpan.org/Public/Dist/Display.html?Name=WWW-Mechanize-Chrome or via mail to www-mechanize-Chrome-Bugs@rt.cpan.org.

AUTHOR

Max Maischein corion@cpan.org

COPYRIGHT (c)

Copyright 2010-2020 by Max Maischein corion@cpan.org.

LICENSE

This module is released under the same terms as Perl itself.