NAME
WWW::YaCyBlacklist - a Perl module to parse and execute YaCy blacklists
VERSION
version 0.8
SYNOPSIS
use WWW::YaCyBlacklist;
my $ycb = WWW::YaCyBlacklist->new( { 'use_regex' => 1 } );
$ycb->read_from_array(
'test1.co/fullpath',
'test2.co/.*',
);
$ycb->read_from_files(
'/path/to/1.black',
'/path/to/2.black',
);
print "Match!" if $ycb->check_url( 'http://test1.co/fullpath' );
my @urls = (
'https://www.perlmonks.org/',
'https://metacpan.org/',
);
my @matches = $ycb->find_matches( @urls );
my @nonmatches = $ycb->find_non_matches( @urls );
$ycb->sortorder( 1 );
$ycb->sorting( 'alphabetical' );
$ycb->filename( '/path/to/new.black' );
$ycb->store_list( );
METHODS
new(%options)
use_regex => 0|1 (default 1)
Can only be set in the constructor and never be changed any later. If false, the pattern will not get checked if the host part is a regular expression (but the patterns remain in the list).
filename => '/path/to/file.black' (default ycb.black)
This is the file printed by store_list
sortorder => 0|1 (default 0)
0 ascending, 1 descending Configures sort_list
sorting => 'alphabetical|length|origorder|random|reverse_host' (default 'origorder)
Configures sort_list
void read_from_array( @patterns )
Reads a list of YaCy blacklist patterns.
void read_from_files( @files )
Reads a list of YaCy blacklist files.
int length( )
Returns the number of patterns in the current list.
bool check_url( $URL )
1 if the URL was matched by any pattern, 0 otherwise.
@URLS_OUT find_matches( @URLS_IN )
Returns all URLs which was matches by the current list.
@URLS_OUT find_non_matches( @URLS_IN )
Returns all URLs which was not matches by the current list.
void delete_pattern( $pattern )
Removes a pattern from the current list.
@patterns sort_list( )
Returns a list of patterns configured by sorting and sortorder.
void store_list( )
Prints the current list to a file. Executes sort_list( ).
OPERATIONAL NOTES
WWW::YaCyBlacklist checks the path part including the leading separator /. This protects against regexp compiling errors with leading quantifiers. So do not something like host.tld/^path although YaCy allows this.
check_url( ) alway returns true if the protocol of the URL is not https? or ftps?.
BUGS
YaCy does not allow host patterns with two ore more stars at the time being. WWW::YaCyBlacklist does not check for this but simply executes. This is rather a YaCy bug.
If there is something you would like to tell me, there are different channels for you:
SOURCE
De:Blacklists (German).
SEE ALSO
AUTHOR
Ingram Braun <carlorff1@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2025 by Ingram Braun.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.