NAME

Regex::PreSuf - create regular expressions from word lists

SYNOPSIS

use Regex::PreSuf;

my $re = presuf(qw(foobar fooxar foozap));

# $re should be now 'foo(?:zap|[bx]ar)'

DESCRIPTION

This module creates regular expressions out of 'word lists', lists of strings, matching the same words. These optimized regular expressions normally run few dozen percentages faster than the simple-minded '|'-concatenation. The easiest thing to do would be of course just to concatenate the words with '|' but this module tries to be cleverer.

The downsides:

the original order of the words is not necessarily respected, for example because the character class matches are collected together, separate from the '|' alternations. You can think of, say, '[ab]' as 'a|b', to see why this matters.
because the module blithely ignores any specialness of any regular expression metacharacters such as the *?+{}[], please do not use them in the words, the resulting regular expression will most likely be illegal

For the second downside there is an exception. The module has some rudimentary grasp of what to do with the 'any character' metacharacter. If you call presuf() like this:

my $re = presuf({ anychar=>1 }, qw(foobar foo.ar fooxar));

# $re should be now 'foo.ar'

The module finds out the common prefixes and suffixes of the words and then recursively looks at the remaining differences. However, by default it only uses prefixes because for many languages (natural or artificial) this seems to produce the fastest matchers. To allow also for suffixes use

my $re = presuf({ suffixes=>1 }, ...);

To use only suffixes use

my $re = presuf({ prefixes=>0 }, ...);

(this implicitly enables suffixes)

COPYRIGHT

Jarkko Hietaniemi <jhi@iki.fi>

This code is distributed under the same copyright terms as Perl itself.