NAME

Lingua::FA::MacFarsi - transcode between Mac OS Farsi encoding and Unicode

SYNOPSIS

(1) using function names exported by default:

use Lingua::FA::MacFarsi;
$wchar = decodeMacFarsi($octet);
$octet = encodeMacFarsi($wchar);

(2) using function names exported on request:

use Lingua::FA::MacFarsi qw(decode encode);
$wchar = decode($octet);
$octet = encode($wchar);

(3) using function names fully qualified:

 use Lingua::FA::MacFarsi ();
 $wchar = Lingua::FA::MacFarsi::decode($octet);
 $octet = Lingua::FA::MacFarsi::encode($wchar);

# $wchar : a string in Perl's Unicode format
# $octet : a legacy byte string (i.e. in MacFarsi)

DESCRIPTION

This module provides decoding from/encoding to Mac OS Farsi encoding (denoted MacFarsi hereafter).

Features

bidi support

Functions provided here should cope with Unicode accompanied with some directional formatting codes: i.e. PDF (or U+202C), LRO (or U+202D), and RLO (or U+202E).

additional mapping

Extended Arabic-Indic Digits and some related characters in Unicode are encoded in MacFarsi as if normal digits (U+0030..U+0039) when they appears in the left-to-right direction.

Functions

$wchar = decode($octet)
$wchar = decodeMacFarsi($octet)

Converts MacFarsi to Unicode.

decodeMacFarsi() is an alias for decode() exported by default.

$octet = encode($wchar)
$octet = encode($handler, $wchar)
$octet = encodeMacFarsi($wchar)
$octet = encodeMacFarsi($handler, $wchar)

Converts Unicode to MacFarsi.

encodeMacFarsi() is an alias for encode() exported by default.

If the $handler is not specified, any character that is not mapped to MacFarsi is deleted; if the $handler is a code reference, a string returned from that coderef is inserted there. if the $handler is a scalar reference, a string (a PV) in that reference (the referent) is inserted there.

The 1st argument for the $handler coderef is the Unicode code point (integer) of the unmapped character.

E.g.

sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR
sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR

print encodeMacFarsi("ABC\x{100}\x{10000}");
# "ABC"

print encodeMacFarsi(\"", "ABC\x{100}\x{10000}");
# "ABC"

print encodeMacFarsi(\"?", "ABC\x{100}\x{10000}");
# "ABC??"

print encodeMacFarsi(\&hexNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"

print encodeMacFarsi(\&decNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"

CAVEAT

Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong.

Maybe bug?: The (default) paragraph direction is not resolved. Does Mac always surround by LRO..PDF or RLO..PDF the characters with bidirectional type to be overridden?

AUTHOR

SADAHIRO Tomoyuki  SADAHIRO@cpan.org

http://homepage1.nifty.com/nomenclator/perl/

Copyright(C) 2003-2003, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

Map (external version) from Mac OS Farsi character set to Unicode

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/FARSI.TXT

The Bidirectional Algorithm

http://www.unicode.org/unicode/reports/tr9/