NAME
Regexp::Ethiopic - Regular Expressions Support for Ethiopic Script.
SYNOPSIS
#
# Overloading Perl REs:
#
use utf8;
use Regexp::Ethiopic qw(:forms overload setForm);
:
s/([#2#])/setForm($1,$ሳድስ)/eg;
s/([መረበወ]{#2#})/setForm($1,$ሳድስ)/eg;
s/([መረበወ]{#1,3#})/setForm($1,$ሳድስ)/eg;
s/([መረበወ]{#1-3,7#})/setForm($1,$ሳድስ)/eg;
s/([#�#])/subForm('ጸ',$1)/eg; # substitute, a 'ጸ' for a '�' in the form found for the '�'
if ( /[#ኘ#]/ ) {
#
# do something
#
:
}
:
:
#
# Without overloading:
#
use utf8;
require Regexp::Ethiopic;
my $string = "[መረበወ]{#1-3,7#}";
my $re = Regexp::Ethiopic::getRe ( $string );
s/abc($re)xyz/"abc".Regexp::Ethipic::setForm($1,6)."xyz"/eg;
DESCRIPTION
The Regexp::Ethiopic module provides POSIX style character class definitions for working with the Ethiopic syllabary. The character classes provided by the Regexp::Ethiopic package correspond to inate properties of the script and are language independent.
The Regexp::Ethiopic package is NOT derived from the Regexp class and may not be instantiated into an object. Regexp::Ethiopic can optionally export the utility functions getForm
, setForm
, subForm
and formatForms
(or all with the :utils
pragma) to query or set the form of an Ethiopic character. Tags of variables in the form names set to form values may be exported under the :forms
pragma.
See the files in the doc/ and examples/ directories that are included with this package.
Substituion Utilities
getForm
A utility function to query the "form" of an Ethiopic syllable. It will return an integer between 1 and 12 corresponding to the [#\d+#] classes.
print getForm ( "አ" ), "\n"; # prints 1
setForm
A utility function to set the form number of a syllable. The form number must be an integer between 1 and 12 corresponding to the [#\d+#] classes.
s/(.)/setForm($1, 1)/eg;
subForm
A utility function to set the form number of a syllable based on the form of another syllable.
s/(\w+)([#�#])/$1.subForm('ጸ', $2)/eg;
formatForms
A utility function somewhat analogous to sprintf
for a sequence of syllables:
print formatForms ( "%1%2%3%4", "አበገደ" ), "\n"; # prints አቡጊዳ
LIMITATIONS
The overloading mechanism only applies to the constant part of the RE. The following would not be handled by the Regexp::Ethipic package as expected:
use Regexp::Ethiopic 'overload';
my $x = "ከ";
:
:
if ( /[#$x#]/ ) {
:
:
}
The package never gets to see the variable $x
to then perform the RE expansion. The work around is to use the package as per:
use Regexp::Ethiopic 'overload';
my $x = "ከ";
:
:
my $re = Regexp::Ethiopic::getRe ( "[#$x#]" );
if ( /$re/ ) {
:
:
}
This works as expected at the cost of one extra step. The overloading and functional modes of the Regexp::Ethiopic package may be used together without conflict.
REQUIRES
Works perfectly with Perl 5.8.0, may work with Perl 5.6.x but has not yet been tested.
BUGS
None presently known.
AUTHOR
Daniel Yacob, dyacob@cpan.org
SEE ALSO
Included with this package:
doc/index.html examples/overload.pl
examples/utils.pl examples/asfunction.pl
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 366:
Non-ASCII character seen before =encoding in 's/([#2#])/setForm($1,$ሳድስ)/eg;'. Assuming CP1252