NAME
Text::Metaphone::Amharic - The Metaphone Algorithm for Amharic.
SYNOPSIS
use
utf8;
require
Text::Metaphone::Amharic;
my
$mphone
= new Text::Metaphone::Amharic;
my
@keys
=
$mphone
->metaphone (
"ሥላሴ"
);
foreach
(
@keys
) {
"$_\n"
;
}
my
$key
=
$mphone
->metaphone (
"á�€áˆ�á‹"
);
"key => $key\n"
;
$mphone
->style (
"ipa"
);
@keys
=
$mphone
->metaphone (
"ሥላሴ"
);
foreach
(
@keys
) {
"$_\n"
;
}
$mphone
->style (
"ethiopic"
);
:
:
The key
"style"
and Metaphone
"grandularity"
can be set at
import
time
:
at instantiation
time
:
my
$mphone
= new Text::Metaphone::Amharic (
style
=>
"ipa"
,
grandularity
=>
"high"
);
or anytime there
after
:
$mphone
->style (
"ethiopic"
);
$mphone
->grandularity (
"low"
);
DESCRIPTION
The Text::Metaphone::Amharic module is a reimplementation of the Amharic Metaphone algorithm of the Text::TransMetaphone package. This implementation uses an object oriented interface and will generate keys in Ethiopic script by default (see the STYLES section for other encoding options).
By default the keys are generated in "low" grandularity mode wich finds the most matches. The GRANDULARITY section discusses the effects of the different levels.
Like Text::TransMetaphone::am the terminal key returned under list context is a regular expression. Amharic character classes will be applied in the RE key as per the conventions of Regexp::Ethiopic::Amharic.
GRANDULARITY
The grandularity parameter refers to the degree of reduction that occurs in the key generation. The grandularity modes were created for investigative purpoes. The most effective "low" level mode is the default.
"high"
The least coarse grain. "ወ" and "የ" are treated under consonant rules. rules, that is stripped out of the string except as the first char. The default IM correction (shift-slip condition) folds keys both upward and downward only. The high grandularity level generates the greatest number of keys. Each substitution causes a new key to be generated so that the set of keys returned represent all possible permutations. The "high" level is the least aggressive in terms of text simplification and leads to the fewest matches. The "high" level is more useful for another types of analysis, such as distance comparison to the canonical word. Since both the canonical and error words have keys folded downward for all grandularity levels during IM corrections, there is no particular advantage to the "high" level for the purpose of matching.
"medium"
An in between grain. "ወ" and "የ" are treated under consonant rules. The default IM correction folds keys downward only. The keys generated represent a "lowest common denominator" that would be reducible from the "high" mode keys. More matches will be found at the lowest grandularity level, but the risk of false matches becomes higher.
"low"
The default and most coarse, or agressive, grain. "ወ" and "የ" are treated under vowel rules, that is stripped out of the string except as the first char. Like the medium level, the default IM correction folds keys downward only and the keys again are lowest common denominators of "high" mode keys. More matches will be found at the lowest grandularity level, but the risk of false matches becomes higher.
STYLES
By default keys are returned with Ethiopic characters (UTF-8 encoding). If this is not your text "style" of choice, IPA symbols and SERA transliteration are also available. The text style can be set and reset at any time:
At Import Time:
At Instantiation Time:
my
$mphone
= new Text::Amharic::Metaphone (
style
=>
"sera"
);
After Instantiation:
$mphone
->style (
"ethio"
);
A reverse
method is also provided to convert an IPA or SERA symbol key into an equivalent Ethiopic sequence.
REQUIRES
COPYRIGHT
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
BUGS
None presently known.
AUTHOR
Daniel Yacob, dyacob@cpan.org
SEE ALSO
Included with this package:
examples/amphone.pl examples/ipa-phone.pl
examples/amphone-high.pl examples/ipa-phone-high.pl
examples/grandularity.p examples/matchtest.pl
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 345:
Non-ASCII character seen before =encoding in '"ሥላሴ"'. Assuming CP1252