NAME
String::Equivalence::Amharic - Normalization Utilities for Amharic.
SYNOPSIS
#
# OO Style:
#
use utf8;
require String::Equivalence::Amharic;
my $string = new String::Equivalence::Amharic;
my @list = $string->downgrade ( "እግዚአብሔር" );
my $count = 0;
foreach (@list) {
$count++;
print "$count: $_\n";
}
#
# Functional Style:
#
use utf8;
use String::Equivalence::Amharic;
my @list = downgrade ( "እግዚአብሔር" );
:
:
:
DESCRIPTION
Under the "three levels of Amharic spelling" theory, the String::Equivalence::Amharic package will take a canonical word (level one) and generate level two words (the level of popular use). The first member of the returned array is the original string. The last member of the returned array is a regular expression that will match all renderings of the list.
The doc/index.html file presents a development of the downgrade rules applied.
The package is useful for some problems, it will produce orthographically "legal" simplification and avoids improbable naive simplifications. Text::Metaphone::Amharic of course over simplifies as it addresses a different problem. So while not to promote level 2 orthographies, in some instances it is useful to generate level 2 renderings given a canonical form.
You must start with the canonical spelling of a word as only downgrades can occur. Starting with a near canonical form and downgrading will generate a shorter word list than you would have starting from the top.
Equivalence Utilities
downgrade =head3 isReducible =head3 hasEquivalence =head3 isEquivalentTo =head3 inflate
A utility function to query the "form" of an Ethiopic syllable. It will return an integer between 1 and 12 corresponding to the [#\d+#] classes.
print getForm ( "አ" ), "\n"; # prints 1
REQUIRES
Regexp::Ethiopic (which rules btw).
COPYRIGHT
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
BUGS
None presently known.
AUTHOR
Daniel Yacob, dyacob@cpan.org
SEE ALSO
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 237:
Non-ASCII character seen before =encoding in '"እግዚአብሔር"'. Assuming UTF-8