NAME
Convert::Moji - Convert between alphabets
SYNOPSIS
# Examples of rot13 transformers:
use Convert::Moji;
# Using a table
my %rot13;
@rot13{('a'..'z')} = ('n'..'z','a'..'m');
my $rot13 = Convert::Moji->new (["table", \%rot13]);
# Using tr
my $rot13_1 = Convert::Moji->new (["tr", "a-z", "n-za-m"]);
# Using a callback
sub rot_13_sub { tr/a-z/n-za-m/; return $_ }
my $rot13_2 = Convert::Moji->new (["code", \&rot_13_sub]);
# Then to do the actual conversion
my $out = $rot13->convert ("secret");
# You also can go backwards with
my $inverted = $rot13->invert ("frperg");
print "$out\n$inverted\n";
produces output
frperg
secret
(This example is included as rot13.pl in the distribution.)
VERSION
This documents Convert::Moji version 0.11 corresponding to git commit 7bf3dfd9543df5abd6e592144cae7075b7c27d3f released on Sat Mar 13 17:13:45 2021 +0900.
DESCRIPTION
Convert::Moji objects convert between different alphabets. For example, a Convert::Moji object can convert between Greek letters and the English alphabet, or convert between phonetic symbols in Unicode and a representation of them in ASCII.
This started as a helper module for Lingua::JA::Moji, where it is used for converting between various Japanese methods of writing. It was split out of that module to be a general-purpose converter for any alphabets.
METHODS
new
my $convert = Convert::Moji->new (["table", $mytable]);
Create the object. The arguments are a list of array references, one for each conversion.
Conversions can be chained together:
my $does_something = Convert::Moji->new (["table", $mytable],
["tr", $left, $right]);
The array references must have one of the following keywords as their first argument.
- table
-
After this comes one more argument, a reference to the hash containing the table. For example
use Convert::Moji; my %crazyhash = ("a" => "apple", "b" => "banana"); my $conv = Convert::Moji->new (["table", \%crazyhash]); my $out = $conv->convert ("a b c"); my $back = $conv->invert ($out); print "$out, $back\n";
produces output
apple banana c, a b c
(This example is included as crazyhash.pl in the distribution.)
The hash keys and values can be any length.
- file
-
After this comes one more argument, the name of a file containing some information to convert into a hash table. The file format is space-separated pairs, no comments or blank lines allowed. If the file does not exist or cannot be opened, the module prints an error message, and returns the undefined value.
- code
-
After this comes one or two references to subroutines. The first subroutine is the conversion and the second one is the inversion routine. If you omit the second routine, it is equivalent to specifying "oneway".
- tr
-
After this come two arguments, the left and right hand sides of a "tr" expression, for example
Convert::Moji->new (["tr", "A-Z", "a-z"])
will convert upper to lower case. A "tr" is performed, and inversely for the invert case.
Conversions, via "convert", will be performed in the order of the arguments to new. Inversions will be performed in reverse order of the arguments, skipping uninvertibles.
Uninvertible operations
If your conversion doesn't actually go backwards, you can tell the module when you create the object using a keyword "oneway":
my $uninvertible = Convert::Moji->new (["oneway", "table", $mytable]);
Then the method $uninvertible->invert
doesn't do anything. You can also selectively choose which operations of a list are invertible and which aren't, so that only the invertible ones do something.
Load from a file
To load a character conversion table from a file, use
Convert::Moji->new (["file", $filename]);
In this case, the file needs to contain a space-separated list of items to be converted one into the other, such as
alpha α
beta β
gamma γ
The file reading cannot handle comments or blank lines in the file. Examples of use of this format are "kana2hw" in Lingua::JA::Moji, "circled2kanji" in Lingua::JA::Moji, and "bracketed2kanji" in Lingua::JA::Moji.
convert
After building the object, it is used to convert text with the "convert" method. The convert method takes one argument, a scalar string to be converted by the rules we specified with "new".
This ignores (passes through) characters which it can't convert.
invert
This inverts the input.
This takes two arguments. The first is the string to be inverted back through the conversion process, and the second is the type of conversion to perform if the inversion is ambiguous. This can take one of the following values
- first
-
If the inversion is ambiguous, it picks the first one it finds.
- random
-
If the inversion is ambiguous, it picks one at random.
- all
-
In this case you get an array reference back containing either strings where the inversion was unambiguous, or array references to arrays containing all possible strings.
- all_joined
-
Like "all", but you get a scalar with all the options in square brackets instead of lots of array references.
The second argument part is only implemented for hash table based conversions, and is very likely to be buggy even then.
FUNCTIONS
These are helper functions for the module.
length_one
# Returns false:
length_one ('x', 'y', 'monkey');
# Returns true:
length_one ('x', 'y', 'm');
Returns true if every element of the array has a length equal to one, and false if any of them does not have length one. The "make_regex" function uses this to decide whether to use a [abc]
or a (a|b|c)
style regex.
make_regex
my $regex = make_regex (qw/a b c de fgh/);
# $regex = "fgh|de|a|b|c";
Given a list of strings, this makes a regular expression which matches any of the strings in the list, longest match first. Each of the elements of the list is quoted using quotemeta
. The regular expression does not contain capturing parentheses.
To convert everything in string $x
from the keys of %foo2bar
to its values,
use Convert::Moji 'make_regex';
my $x = 'mad, bad, and dangerous to know';
my %foo2bar = (mad => 'max', dangerous => 'trombone');
my $regex = make_regex (keys %foo2bar);
$x =~ s/($regex)/$foo2bar{$1}/g;
print "$x\n";
produces output
max, bad, and trombone to know
(This example is included as trombone.pl in the distribution.)
For another example, see the "joke" program at "english" in Data::Kanji::Kanjidic.
unambiguous
my $invertible = unambiguous (\%table));
Returns true if all of the values in %table
are distinct, and false if any two of the values in %table
are the same. This is used by "invert" to decide whether a table can be reversed.
use utf8;
use FindBin '$Bin';
use Convert::Moji 'unambiguous';
my %ambig = (
a => 'b',
c => 'b',
);
my %unambig = (
a => 'b',
c => 'd',
);
for my $thing (\%ambig, \%unambig) {
if (unambiguous ($thing)) {
print "un";
}
print "ambiguous\n";
}
produces output
ambiguous
unambiguous
(This example is included as unambiguous.pl in the distribution.)
SEE ALSO
- Lingua::JA::Moji
-
Uses this module.
- Lingua::KO::Munja
-
Uses this module.
- "list2re" in Data::Munge
-
This is similar to "make_regex" in this module.
- Lingua::Translit
-
Transliterates text between writing systems
- Match a dictionary against a string
-
A list of various other CPAN modules for matching a dictionary of words against strings.
EXPORTS
The functions "make_regex", "length_one" and "unambiguous" are exported on demand. There are no export tags.
DEPENDENCIES
- Carp
-
Functions
carp
andcroak
are used to report errors.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2008-2021 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.