NAME

WWW::YaCyBlacklist - a Perl module to parse and execute YaCy blacklists

VERSION

version 0.3

SYNOPSIS

use WWW::YaCyBlacklist;

my $ycb = WWW::YaCyBlacklist->new( { 'use_regex' => 1 } );
$ycb->read_from_array(
    'test1.co/fullpath',
    'test2.co/.*',
);
$ycb->read_from_files(
    '/path/to/1.black',
    '/path/to/2.black',
);

print "Match!" if $ycb->check_url( 'http://test1.co/fullpath' );
my @urls = (
    'https://www.perlmonks.org/',
    'https://metacpan.org/',
);
my @matches = $ycb->find_matches( @urls );
my @nonmatches = $ycb->find_non_matches( @urls );

$ycb->sortorder( 1 );
$ycb->sorting( 'alphabetical' );
$ycb->store_list( '/path/to/new.black' );

METHODS

new(%options)

use_regex => 0|1 (default 1)

Can only be set in the constructor and never be changed any later. If false, the pattern will not get checked if the host part is a regular expression (but the patterns remain in the list).

filename => '/path/to/file.black' (default ycb.black)

This is the file printed by store_list

sortorder => 0|1 (default 0)

0 ascending, 1 descending Configures sort_list

sorting => 'alphabetical|length|origorder|random|reverse_host' (default 'origorder)

Configures sort_list

void read_from_array( @patterns )

Reads a list of YaCy blacklist patterns.

void read_from_files( @files )

Reads a list of YaCy blacklist files.

int length( )

Returns the number of patterns in the current list.

bool check_url( $URL )

1 if the URL was matched by any pattern, 0 otherwise.

@URLS_OUT find_matches( @URLS_IN )

Returns all URLs which was matches by the current list.

@URLS_OUT find_non_matches( @URLS_IN )

Returns all URLs which was not matches by the current list.

void delete_pattern( $pattern )

Removes a pattern from the current list.

@patterns sort_list( )

Returns a list of patterns configured by sorting and sortorder.

void store_list( )

Prints the current list to a file. Executes sort_list( ).

OPERATIONAL NOTES

The error

^* matches null string many times in regex; marked by <-- HERE in m/^^* <-- HERE

is probably caused by a corrupted path part of a pattern in your list (* instead of .*).

BUGS

YaCy does not allow host patterns with to stars at the time being. WWW::YaCyBlacklist does not check for this but simply executes. This is rather a YaCy bug.

If there is something you would like to tell me, there are different channels for you:

SOURCE

SEE ALSO

AUTHOR

Ingram Braun <carlorff1@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2025 by Ingram Braun.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.