NAME

Search::WuManber -- A fast algorithm for multi-pattern searching

SYNOPSIS

use Search::WuManber;

my $search = Search::WuManber->new([qw(tribute reserved serve distribute)]);
my @matches = $search->all(  lc "All rights reserved. Distribute freely.");
my $match =   $search->first(lc "All rights reserved. Distribute freely.");
$match = $search->next();

DESCRIPTION

This module implements the Wu-Manber multi-pattern parallel search algorithm.

The search strings passed to new() are prepared for parallel lookup. A perl reference pointing to all internal data is returned. Treat this reference as opaque. first() and next() iterate over all text positions where matches occur. Each match is returned as a reference to a two-element array representing text offset and list index of a search string. Options for new(): * <return_string = 1>> make the return value of the iterator a three-element array, containing also the search string itself. * <case_sensitive = 0>> run the search case insensitive (slightly slower).

The matches are returned roughly sorted. Offset usually increments, but may jump backwards by the length difference of neighbouring search strings.

New() allocates a constant amount of memory (between 130k and 2MB). This memory can be returned by undef $search;

BUGS

The iterator is inefficient. first() just calls all() ...,

This implementation uses internal hash-functions, which may not be optimal.

Efficiency of the WuManber depends on the minimum length of search strings. Suggested minimum length is 5. new() switches to a slower algorithm if one of the strings has less than 3 characters.

Changes in the list of search patterns are not seen by the search algorithm after new() was called. Changes in the text are undefined. Call first() to restart with a new text.

REFERENCES

Sun Wu and Udi Manber (1994) A fast algorithm for multi-pattern searching. Technical Report TR-94-17, University of Arizona. http://webglimpse.net/pubs/TR94-17.pdf

ftp://ftp.cs.arizona.edu/agrep/agrep-2.04.tar.Z

www.snort.org

Rolf Stiebe, Textalgorithmen, WS 2005/6

SEE ALSO

Text::Scan, Algorithm::AhoCorasick, Algorithm::RabinKarp

AUTHOR

Juergen Weigert <jw@cs.fau.de>

COPYRIGHT AND LEGALESE

Copyright (c) 2007, Juergen Weigert, Novell Inc. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.