NAME

String::Approx - approximate matching and substitution

SYNOPSIS

use String::Approx qw(amatch asubstitute);

# amatch() and asubstitute imported to the current namespace,
# by default _nothing_ is imported

amatch

amatch($approximate_string[, ...]);

All the other amatch() arguments are optional except the $approximate_string itself.

The additional parameters are strings of any the forms:

number		e.g. '1', the maximum number of transformations
		for all the transformation types,
		the types being insert/delete/substitute

number%		e.g. '15%', the relative maximum number of
		all the transformations is 15% of the
		approximating string

[IDS]number	e.g. 'I2', the maximum number of insertions

[IDS]number%	e.g. 'D20%', the relative maximum number of
		deletions is 20% of the length of the
		approximating string

[gimosx]	e.g. 'im', the usual m// modifiers

The default is parameter '10%'. Two noteworthy points:

the relative amounts, especially the default 10%,
would often result in number of allowed 'errors' being
less than 1. this, however, does not happen. internally
the minimum is forced to be 1.
(0 can be and must be explicitly asked for)

the relative amounts are rounded to the nearest whole
number in the standard way, e.g. 10% of 15 will end up
being 2.

You can combine all the number parameter types into a single string, e.g. '15%i2'.

asubstitute

asubstitute($approximate_string, $substitute[, ...]);

Otherwise identical parameters with amatch() except for the substitution string, $substitute.

RETURN VALUES

In scalar context amatch() and asubstitute return the number of possible matches and substitutions. In list context they return the list all the possible matches and substitutions. Note that in the case of asubstitute() the list of possible substitutions may be longer than the list of done substitutions because possible substitutions may overlap. The first and the longest substitutions are done first, the rest are done if they do not overlap the already one substitutions.

As a side-effect asubstitute() may change the value of $_ if approximate matches are found.

Note that error messages and warnings come from amatch(), not from asubstitute().

EXAMPLES

amatch($s);		# the maximum amount of approximateness
			# is max(1,10_%_of_length($s))
amatch($s, 1);		# the maximum number of any
			# insertions/deletes/substitutions
			# (_separately_) is 1
amatch($s, 'I1D0S30%');	# the maximum amount for insertions is 1,
       			# deletions are not allowed, the maximum
			# amount of substitutions
			# is max(1,5_%_of_length($s))

asubstitute($s, '($&)', 'g');
			# surround in $_ all ('g') the approximate
			# matches by parentheses

asubstitute($s, '&func', 'e');
			# substitute in $_ the first approximate
			# match with the result of &func (without
			# the 'e' literal string '&func' would be
			# the substitute)

LIMITATIONS

You cannot mix approximate matching and normal Perl regular expressions (see perlre). Please do not even think about it. Do not use characters .?*+{}[](|)^$\ (that is, any characters that have special meaning in regular expressions) in your approximate strings.

Matching and substitution are always done on $_. The =~ binding operator (see perlop) can only be used with the Perl builtins m//, s///, and tr///, not for user-defined functions such as amatch().

agrep is faster. Searching for 'perl' with one each [IDS] allowed from a wordlist of 25486 words took with amatch() 656 seconds on a RISC box while agrep took 0.77 seconds. This is because String::Approx does the same things with an interpreted language, Perl, whereas agrep does it in compiled, language, C, and because doing approximate matching is very demanding operation, especially the substitutions. String::Approx does it by (ab)using regular expressions which is quite wasteful, approximate should be built in for it to be effective. The time taken by I is about 30%, by D about 20%, and by S about 50%. (In case you are wondering, yes, agrep and amatch() did agree on the list of matching words)

VERSION

v1.5, $Id: Approx.pm,v 1.8 1995/11/02 13:15:29 jhi Exp $

AUTHOR

Jarkko Hietaniemi, Jarkko.Hietaniemi@hut.fi