NAME
Regexp::Cherokee - Regular Expressions Support for Cherokee Script.
SYNOPSIS
#
# Overloading Perl REs:
#
use utf8;
use Regexp::Cherokee qw(overload setForm);
:
s/([#2#])/setForm($1,6)/eg;
s/([ᎠᎦᎧáŽ]%2)/setForm($1,6)/eg;
s/([ᎠᎦᎧáŽ]%{1,3})/setForm($1,6)/eg;
s/([ᎠᎦᎧáŽ]%{1-3,7})/setForm($1,6)/eg;
s/([#Ꮎ#])/subForm('�',$1)/eg; # substitute, a '�' for a 'Ꮎ' in the form found for the 'Ꮎ'
if ( /[#�#]/ ) {
#
# do something
#
:
}
:
:
#
# Without overloading:
#
use utf8;
require Regexp::Cherokee;
my $string = "[ᎠᎦᎧáŽ]%{1-3,7}";
my $re = Regexp::Cherokee::getRe ( $string );
s/abc($re)xyz/"abc".Regexp::Cherokee::setForm($1,6)."xyz"/eg;
DESCRIPTION
The Regexp::Cherokee module provides POSIX style character class definitions for working with the Cherokee syllabary. The character classes provided by the Regexp::Cherokee package correspond to inate properties of the script and are language independent.
The Regexp::Cherokee package is NOT derived from the Regexp class and may not be instantiated into an object. Regexp::Cherokee can optionally export the utility functions getForm
, setForm
, subForm
and formatForms
(or all with the :utils
pragma) to query or set the form of an Cherokee character. Tags of variables in the form names set to form values may be exported under the :forms
pragma.
See the files in the doc/ and examples/ directories that are included with this package.
Substituion Utilities
getForm
A utility function to query the "form" of an Cherokee syllable. It will return an integer between 1 and 12 corresponding to the [#\d+#] classes.
print getForm ( "�" ), "\n"; # prints 1
setForm
A utility function to set the form number of a syllable. The form number must be an integer between 1 and 12 corresponding to the [#\d+#] classes.
s/(.)/setForm($1, 1)/eg;
subForm
A utility function to set the form number of a syllable based on the form of another syllable.
s/(\w+)([#Ꮎ#]/$1.subForm('�', $2)/eg;
formatForms
A utility function somewhat analogous to sprintf
for a sequence of syllables:
print formatForms ( "%1%2%3%4", "ᎠᎦᎧáŽ" ), "\n"; # prints ᎠᎨᎯᎶ
LIMITATIONS
The overloading mechanism only applies to the constant part of the RE. The following would not be handled by the Regexp::Ethiopic package as expected:
use Regexp::Cherokee 'overload';
my $x = "Ꭷ";
:
:
if ( /[#$x#]/ ) {
:
:
}
The package never gets to see the variable $x
to then perform the RE expansion. The work around is to use the package as per:
use Regexp::Cherokee 'overload';
my $x = "Ꭷ";
:
:
my $re = Regexp::Cherokee::getRe ( "[#$x#]" );
if ( /$re/ ) {
:
:
}
This works as expected at the cost of one extra step. The overloading and functional modes of the Regexp::Cherokee package may be used together without conflict.
REQUIRES
Works perfectly with Perl 5.8.0, may work with Perl 5.6.x but has not yet been tested.
BUGS
None presently known.
AUTHOR
Daniel Yacob, dyacob@cpan.org
SEE ALSO
Included with this package:
examples/overload.pl examples/utils.p
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 322:
Non-ASCII character seen before =encoding in 's/([ᎠᎦᎧáŽ]%2)/setForm($1,6)/eg;'. Assuming CP1252