NAME

Regexp::Ethiopic - Regular Expressions Support for Ethiopic Script.

SYNOPSIS

#
#  Overloading Perl REs:
#
use utf8;
use Regexp::Ethiopic qw(:forms overload setForm);

:

s/([#2#])/setForm($1,$ሳድስ)/eg;
s/([መረበወ]{#2#})/setForm($1,$ሳድስ)/eg;
s/([መረበወ]{#1,3#})/setForm($1,$ሳድስ)/eg;
s/([መረበወ]{#1-3,7#})/setForm($1,$ሳድስ)/eg;
s/([#�#])/subForm('ጸ',$1)/eg;  # substitute, a 'ጸ' for a '�' in the form found for the '�'

if ( /[#ኘ#]/ ) {
  #
  # do something
  #
  :
}

:
:

#
#  Without overloading:
#
use utf8;
require Regexp::Ethiopic;

my $string = "[መረበወ]{#1-3,7#}";
my $re = Regexp::Ethiopic::getRe ( $string );

s/abc($re)xyz/"abc".Regexp::Ethipic::setForm($1,6)."xyz"/eg;

DESCRIPTION

The Regexp::Ethiopic module provides POSIX style character class definitions for working with the Ethiopic syllabary. The character classes provided by the Regexp::Ethiopic package correspond to inate properties of the script and are language independent.

The Regexp::Ethiopic package is NOT derived from the Regexp class and may not be instantiated into an object. Regexp::Ethiopic can optionally export the utility functions getForm, setForm, subForm and formatForms (or all with the :utils pragma) to query or set the form of an Ethiopic character. Tags of variables in the form names set to form values may be exported under the :forms pragma.

See the files in the doc/ and examples/ directories that are included with this package.

Substituion Utilities

getForm

A utility function to query the "form" of an Ethiopic syllable. It will return an integer between 1 and 12 corresponding to the [#\d+#] classes.

print getForm ( "አ" ), "\n";  # prints 1

setForm

A utility function to set the form number of a syllable. The form number must be an integer between 1 and 12 corresponding to the [#\d+#] classes.

s/(.)/setForm($1, 1)/eg;

subForm

A utility function to set the form number of a syllable based on the form of another syllable.

s/(\w+)([#�#])/$1.subForm('ጸ', $2)/eg;

formatForms

A utility function somewhat analogous to sprintf for a sequence of syllables:

print formatForms ( "%1%2%3%4", "አበገደ" ), "\n";  # prints አቡጊዳ

LIMITATIONS

The overloading mechanism only applies to the constant part of the RE. The following would not be handled by the Regexp::Ethipic package as expected:

use Regexp::Ethiopic 'overload';

my $x = "ከ";
      :
      :
if ( /[#$x#]/ ) {
      :
      :
}

The package never gets to see the variable $x to then perform the RE expansion. The work around is to use the package as per:

use Regexp::Ethiopic 'overload';

my $x = "ከ";
      :
      :
my $re = Regexp::Ethiopic::getRe ( "[#$x#]" );

if ( /$re/ ) {
      :
      :
}

This works as expected at the cost of one extra step. The overloading and functional modes of the Regexp::Ethiopic package may be used together without conflict.

REQUIRES

Works perfectly with Perl 5.8.0, may work with Perl 5.6.x but has not yet been tested.

BUGS

None presently known.

AUTHOR

Daniel Yacob, dyacob@cpan.org

SEE ALSO

Included with this package:

doc/index.html       examples/overload.pl
examples/utils.pl    examples/asfunction.pl

1 POD Error

The following errors were encountered while parsing the POD:

Around line 366:

Non-ASCII character seen before =encoding in 's/([#2#])/setForm($1,$ሳድስ)/eg;'. Assuming CP1252