NAME
Lingua::ZH::MacChinese::Simplified - transcoding between Mac OS Chinese Simplified encoding and Unicode
SYNOPSIS
(1) using function names exported by default:
use Lingua::ZH::MacChinese::Simplified;
$wchar = decodeMacChineseSimp($octet);
$octet = encodeMacChineseSimp($wchar);
(2) using function names exported on request:
use Lingua::ZH::MacChinese::Simplified qw(decode encode);
$wchar = decode($octet);
$octet = encode($wchar);
(3) using function names fully qualified:
use Lingua::ZH::MacChinese::Simplified ();
$wchar = Lingua::ZH::MacChinese::Simplified::decode($octet);
$octet = Lingua::ZH::MacChinese::Simplified::encode($wchar);
# $wchar : a string in Perl's Unicode format
# $octet : a string in Mac OS Chinese Simplified encoding
DESCRIPTION
This module provides transcoding from/to Mac OS Chinese Simplified encoding (denoted MacChineseSimp hereafter).
In order to ensure roundtrip mapping, MacChineseSimp encoding has some characters with mapping from a single MacChineseSimp character to a sequence of Unicode characters and vice versa. Such characters include 0xA6D9
(MacChineseSimp) from/to 0xFF0C+0xF87E
(Unicode) for "FULLWIDTH COMMA for vertical text"
.
This module provides functions to transcode between MacChineseSimp and Unicode, without information loss for every MacChineseSimp character.
Functions
$wchar = decode($octet)
$wchar = decode($handler, $octet)
$wchar = decodeMacChineseSimp($octet)
$wchar = decodeMacChineseSimp($handler, $octet)
-
Converts MacChineseSimp to Unicode.
decodeMacChineseSimp()
is an alias fordecode()
exported by default.If the
$handler
is not specified, any MacChineseSimp character that is not mapped to Unicode is deleted; if the$handler
is a code reference, a string returned from that coderef is inserted there. if the$handler
is a scalar reference, a string (aPV
) in that reference (the referent) is inserted there.The 1st argument for the
$handler
coderef is a string of the unmapped MacChineseSimp character (e.g."\xFC\xFE"
). $octet = encode($wchar)
$octet = encode($handler, $wchar)
$octet = encodeMacChineseSimp($wchar)
$octet = encodeMacChineseSimp($handler, $wchar)
-
Converts Unicode to MacChineseSimp.
encodeMacChineseSimp()
is an alias forencode()
exported by default.If the
$handler
is not specified, any Unicode character that is not mapped to MacChineseSimp is deleted; if the$handler
is a code reference, a string returned from that coderef is inserted there. if the$handler
is a scalar reference, a string (aPV
) in that reference (the referent) is inserted there.The 1st argument for the
$handler
coderef is the Unicode code point (unsigned integer) of the unmapped character.E.g.
sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR print encodeMacChineseSimp("ABC\x{100}\x{10000}"); # "ABC" print encodeMacChineseSimp(\"", "ABC\x{100}\x{10000}"); # "ABC" print encodeMacChineseSimp(\"?", "ABC\x{100}\x{10000}"); # "ABC??" print encodeMacChineseSimp(\&hexNCR, "ABC\x{100}\x{10000}"); # "ABCĀ𐀀" print encodeMacChineseSimp(\&decNCR, "ABC\x{100}\x{10000}"); # "ABCĀ𐀀"
CAVEAT
Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong.
AUTHOR
SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C) 2003-2007, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
- Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later (version: c02 2005-Apr-04)
-
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINSIMP.TXT
- Registry (external version) of Apple use of Unicode corporate-zone characters (version: c03 2005-Apr-04)
-
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CORPCHAR.TXT