NAME
Text::Fuzzy - partial or fuzzy string matching using edit distances
SYNOPSIS
use Text::Fuzzy;
my $tf = Text::Fuzzy->new ('boboon');
print "Distance is ", $tf->distance ('babboon'), "\n";
# Prints "Distance is 2"
my @words = qw/the quick brown fox jumped over the lazy dog/;
my $nearest = $tf->nearest (\@words);
print "Nearest array entry is ", $words[$nearest], "\n";
# Prints "Nearest array entry is brown"
DESCRIPTION
This module calculates the Levenshtein edit distance between words, and does edit-distance-based searching of arrays and files to find the nearest entry. It can handle either byte strings or character strings (strings containing Unicode), treating each Unicode character as a single entity.
It is designed for high performance in searching for the nearest to a particular search term over an array of words or a file, by reducing the number of calculations which needs to be performed.
It supports either bytewise edit distances or Unicode-based edit distances:
use utf8;
my $tf = Text::Fuzzy->new ('あいうえお☺');
print $tf->distance ('うえお☺'), "\n";
# prints "2".
METHODS
new
my $tf = Text::Fuzzy->new ('bibbety bobbety boo');
Create a new Text::Fuzzy object from the supplied word.
distance
my $dist = $tf->distance ($word);
Return the edit distance to $word
from the word used to create the object in "new".
nearest
my $index = $tf->nearest (\@words);
Return the index of the nearest element in the array to the argument.
get_max_distance
# Get the maximum edit distance.
print "The max distance is ", $tf->getmax_distance (), "\n";
Get the maximum edit distance of $tf
. The default is set to 10.
set_max_distance
# Set the max distance.
$tf->set_max_distance (3);
Set the maximum edit distance of $tf
. The default is set to 10. If this is called with an undefined value, the maximum edit distance is switched off.
scan_file
$tf->scan_file ('/usr/share/dict/words');
Scan a file to find the nearest match to the word used in "new". This assumes that the file contains lines of text separated by newlines and finds the closest match in the file.
This does not currently support Unicode-encoded files.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2012 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.