NAME
Lingua::FA::MacFarsi - transcode between Mac OS Farsi encoding and Unicode
SYNOPSIS
(1) using function names exported by default:
use Lingua::FA::MacFarsi;
$wchar = decodeMacFarsi($octet);
$octet = encodeMacFarsi($wchar);
(2) using function names exported on request:
use Lingua::FA::MacFarsi qw(decode encode);
$wchar = decode($octet);
$octet = encode($wchar);
(3) using function names fully qualified:
use Lingua::FA::MacFarsi ();
$wchar = Lingua::FA::MacFarsi::decode($octet);
$octet = Lingua::FA::MacFarsi::encode($wchar);
# $wchar : a string in Perl's Unicode format
# $octet : a legacy byte string (i.e. in MacFarsi)
DESCRIPTION
This module provides decoding from/encoding to Mac OS Farsi encoding (denoted MacFarsi hereafter).
Features
- bidi support
-
Functions provided here should cope with Unicode accompanied with some directional formatting codes: i.e.
PDF
(orU+202C
),LRO
(orU+202D
), andRLO
(orU+202E
). - additional mapping
-
Extended Arabic-Indic Digits and some related characters in Unicode are encoded in MacFarsi as if normal digits (
U+0030
..U+0039
) when they appears in the left-to-right direction.
Functions
$wchar = decode($octet)
$wchar = decodeMacFarsi($octet)
-
Converts MacFarsi to Unicode.
decodeMacFarsi()
is an alias fordecode()
exported by default. $octet = encode($wchar)
$octet = encode($handler, $wchar)
$octet = encodeMacFarsi($wchar)
$octet = encodeMacFarsi($handler, $wchar)
-
Converts Unicode to MacFarsi.
encodeMacFarsi()
is an alias forencode()
exported by default.If the
$handler
is not specified, any character that is not mapped to MacFarsi is deleted; if the$handler
is a code reference, a string returned from that coderef is inserted there. if the$handler
is a scalar reference, a string (aPV
) in that reference (the referent) is inserted there.The 1st argument for the
$handler
coderef is the Unicode code point (integer) of the unmapped character.E.g.
sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR print encodeMacFarsi("ABC\x{100}\x{10000}"); # "ABC" print encodeMacFarsi(\"", "ABC\x{100}\x{10000}"); # "ABC" print encodeMacFarsi(\"?", "ABC\x{100}\x{10000}"); # "ABC??" print encodeMacFarsi(\&hexNCR, "ABC\x{100}\x{10000}"); # "ABCĀ𐀀" print encodeMacFarsi(\&decNCR, "ABC\x{100}\x{10000}"); # "ABCĀ𐀀"
CAVEAT
Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong.
Maybe bug?: The (default) paragraph direction is not resolved. Does Mac always surround by LRO
..PDF
or RLO
..PDF
the characters with bidirectional type to be overridden?
AUTHOR
SADAHIRO Tomoyuki SADAHIRO@cpan.org
http://homepage1.nifty.com/nomenclator/perl/
Copyright(C) 2003-2003, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.