NAME
Search::WuManber -- A fast algorithm for multi-pattern searching
SYNOPSIS
use Search::WuManber;
my $search = Search::WuManber->new([qw(tribute reserved serve distribute)]);
my @matches = $search->all( lc "All rights reserved. Distribute freely.");
my $match = $search->first(lc "All rights reserved. Distribute freely.");
$match = $search->next();
DESCRIPTION
This module implements the Wu-Manber multi-pattern parallel search algorithm.
The search strings passed to new()
are prepared for parallel lookup. A perl reference pointing to all internal data is returned. Treat this reference as opaque. first()
and next()
iterate over all text positions where matches occur. Each match is returned as a reference to a two-element array representing text offset and list index of a search string. Options for new()
: * <return_string =
1>> make the return value of the iterator a three-element array, containing also the search string itself. * <case_sensitive =
0>> run the search case insensitive (slightly slower).
The matches are returned roughly sorted. Offset usually increments, but may jump backwards by the length difference of neighbouring search strings.
New()
allocates a constant amount of memory (between 130k and 2MB). This memory can be returned by undef $search;
BUGS
The iterator is inefficient. first()
just calls all()
...,
This implementation uses internal hash-functions, which may not be optimal.
Efficiency of the WuManber depends on the minimum length of search strings. Suggested minimum length is 5. new()
switches to a slower algorithm if one of the strings has less than 3 characters.
Changes in the list of search patterns are not seen by the search algorithm after new()
was called. Changes in the text are undefined. Call first()
to restart with a new text.
REFERENCES
Sun Wu and Udi Manber (1994) A fast algorithm for multi-pattern searching. Technical Report TR-94-17, University of Arizona. http://webglimpse.net/pubs/TR94-17.pdf
ftp://ftp.cs.arizona.edu/agrep/agrep-2.04.tar.Z
www.snort.org
Rolf Stiebe, Textalgorithmen, WS 2005/6
SEE ALSO
Text::Scan
, Algorithm::AhoCorasick
, Algorithm::RabinKarp
AUTHOR
Juergen Weigert <jw@cs.fau.de>
COPYRIGHT AND LEGALESE
Copyright (c) 2007, Juergen Weigert, Novell Inc. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.