NAME

Regexp::Cherokee - Regular Expressions Support for Cherokee Script.

SYNOPSIS

#
#  Overloading Perl REs:
#
use utf8;
use Regexp::Cherokee qw(overload setForm);

:

s/([#2#])/setForm($1,6)/eg;
s/([ᎠᎦᎧᎭ]%2)/setForm($1,6)/eg;
s/([ᎠᎦᎧᎭ]%{1,3})/setForm($1,6)/eg;
s/([ᎠᎦᎧᎭ]%{1-3,7})/setForm($1,6)/eg;
s/([#Ꮎ#])/subForm('�',$1)/eg;  # substitute, a '�' for a 'Ꮎ' in the form found for the 'Ꮎ'

if ( /[#�#]/ ) {
  #
  # do something
  #
  :
}

:
:

#
#  Without overloading:
#
use utf8;
require Regexp::Cherokee;

my $string = "[ᎠᎦᎧᎭ]%{1-3,7}";
my $re = Regexp::Cherokee::getRe ( $string );

s/abc($re)xyz/"abc".Regexp::Cherokee::setForm($1,6)."xyz"/eg;

DESCRIPTION

The Regexp::Cherokee module provides POSIX style character class definitions for working with the Cherokee syllabary. The character classes provided by the Regexp::Cherokee package correspond to inate properties of the script and are language independent.

The Regexp::Cherokee package is NOT derived from the Regexp class and may not be instantiated into an object. Regexp::Cherokee can optionally export the utility functions getForm, setForm, subForm and formatForms (or all with the :utils pragma) to query or set the form of an Cherokee character. Tags of variables in the form names set to form values may be exported under the :forms pragma.

See the files in the doc/ and examples/ directories that are included with this package.

Substituion Utilities

getForm

A utility function to query the "form" of an Cherokee syllable. It will return an integer between 1 and 12 corresponding to the [#\d+#] classes.

print getForm ( "�" ), "\n";  # prints 1

setForm

A utility function to set the form number of a syllable. The form number must be an integer between 1 and 12 corresponding to the [#\d+#] classes.

s/(.)/setForm($1, 1)/eg;

subForm

A utility function to set the form number of a syllable based on the form of another syllable.

s/(\w+)([#Ꮎ#]/$1.subForm('�', $2)/eg;

formatForms

A utility function somewhat analogous to sprintf for a sequence of syllables:

print formatForms ( "%1%2%3%4", "ᎠᎦᎧᎭ" ), "\n";  # prints ᎠᎨᎯᎶ

LIMITATIONS

The overloading mechanism only applies to the constant part of the RE. The following would not be handled by the Regexp::Ethiopic package as expected:

use Regexp::Cherokee 'overload';

my $x = "Ꭷ";
      :
      :
if ( /[#$x#]/ ) {
      :
      :
}

The package never gets to see the variable $x to then perform the RE expansion. The work around is to use the package as per:

use Regexp::Cherokee 'overload';

my $x = "Ꭷ";
      :
      :
my $re = Regexp::Cherokee::getRe ( "[#$x#]" );

if ( /$re/ ) {
      :
      :
}

This works as expected at the cost of one extra step. The overloading and functional modes of the Regexp::Cherokee package may be used together without conflict.

REQUIRES

Works perfectly with Perl 5.8.0, may work with Perl 5.6.x but has not yet been tested.

BUGS

None presently known.

AUTHOR

Daniel Yacob, dyacob@cpan.org

SEE ALSO

Included with this package:

examples/overload.pl    examples/utils.p

1 POD Error

The following errors were encountered while parsing the POD:

Around line 322:

Non-ASCII character seen before =encoding in 's/([ᎠᎦᎧᎭ]%2)/setForm($1,6)/eg;'. Assuming CP1252