package Lingua::AR::MacArabic; require 5.006001; use strict; require Exporter; require DynaLoader; our $VERSION = '0.10'; our @ISA = qw(Exporter DynaLoader); our @EXPORT = qw(decodeMacArabic encodeMacArabic); our @EXPORT_OK = qw(decode encode); bootstrap Lingua::AR::MacArabic $VERSION; 1; __END__ =head1 NAME Lingua::AR::MacArabic - transcoding between Mac OS Arabic encoding and Unicode =head1 SYNOPSIS (1) using function names exported by default: use Lingua::AR::MacArabic; $wchar = decodeMacArabic($octet); $octet = encodeMacArabic($wchar); (2) using function names exported on request: use Lingua::AR::MacArabic qw(decode encode); $wchar = decode($octet); $octet = encode($wchar); (3) using function names fully qualified: use Lingua::AR::MacArabic (); $wchar = Lingua::AR::MacArabic::decode($octet); $octet = Lingua::AR::MacArabic::encode($wchar); # $wchar : a string in Perl's Unicode format # $octet : a string in Mac OS Arabic encoding =head1 DESCRIPTION This module provides decoding from/encoding to Mac OS Arabic encoding (denoted MacArabic hereafter). =head2 Features =over 4 =item bidi support Functions provided here should cope with Unicode accompanied with some directional formatting codes: i.e. C<PDF> (or C<U+202C>), C<LRO> (or C<U+202D>), and C<RLO> (or C<U+202E>). =item additional mapping Arabic-Indic Digits and some related characters in Unicode are encoded in MacArabic as if normal digits (C<U+0030>..C<U+0039>) when they appear in the left-to-right direction. =back =head2 Functions =over 4 =item C<$wchar = decode($octet)> =item C<$wchar = decodeMacArabic($octet)> Converts MacArabic to Unicode. C<decodeMacArabic()> is an alias for C<decode()> exported by default. =item C<$octet = encode($wchar)> =item C<$octet = encode($handler, $wchar)> =item C<$octet = encodeMacArabic($wchar)> =item C<$octet = encodeMacArabic($handler, $wchar)> Converts Unicode to MacArabic. C<encodeMacArabic()> is an alias for C<encode()> exported by default. If the C<$handler> is not specified, any character that is not mapped to MacArabic is deleted; if the C<$handler> is a code reference, a string returned from that coderef is inserted there. if the C<$handler> is a scalar reference, a string (a C<PV>) in that reference (the referent) is inserted there. The 1st argument for the C<$handler> coderef is the Unicode code point (integer) of the unmapped character. E.g. sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR print encodeMacArabic("ABC\x{100}\x{10000}"); # "ABC" print encodeMacArabic(\"", "ABC\x{100}\x{10000}"); # "ABC" print encodeMacArabic(\"?", "ABC\x{100}\x{10000}"); # "ABC??" print encodeMacArabic(\&hexNCR, "ABC\x{100}\x{10000}"); # "ABCĀ𐀀" print encodeMacArabic(\&decNCR, "ABC\x{100}\x{10000}"); # "ABCĀ𐀀" =back =head1 CAVEAT Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong. B<Maybe bug?>: The (default) paragraph direction is not resolved. Does Mac always surround by C<LRO>..C<PDF> or C<RLO>..C<PDF> the characters with bidirectional type to be overridden? =head1 AUTHOR SADAHIRO Tomoyuki <SADAHIRO@cpan.org> Copyright(C) 2003-2011, SADAHIRO Tomoyuki. Japan. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 SEE ALSO =over 4 =item Map (external version) from Mac OS Arabic character set to Unicode 2.1 and later (version: c02 2005-Apr-04) L<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ARABIC.TXT> =item Registry (external version) of Apple use of Unicode corporate-zone characters (version: c03 2005-Apr-04) L<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CORPCHAR.TXT> =item The Bidirectional Algorithm L<http://www.unicode.org/reports/tr9/> =back =cut