NAME
ShiftJIS::CP932::MapUTF - Converts between CP-932 (Shift_JIS supported by MS) and unicode
SYNOPSIS
use ShiftJIS::CP932::MapUTF;
$utf8_string = cp932_to_utf8($cp932_string);
DESCRIPTION
The Microsoft CodePage 932 (CP-932) table comprises 7915 characters:
JIS X 0201-1976 single-byte characters (191 characters),
JIS X 0208-1990 double-byte characters (6879 characters),
NEC special characters (83 characters from SJIS row 13),
NEC-selected IBM extended characters (374 characters from SJIS row 89 to 92),
and IBM extended characters (388 characters from SJIS row 115 to 119).
It contains duplicates that do not round trip map. These duplicates are due to the characters defined by vendors, NEC and IBM.
For example, there are two characters that are mapped to U+2252, namely, 0x81e0 (JIS X 0208) and 0x8790 (NEC special character).
This module provides some functions to maps from CP-932 to Unicode, and vice versa.
cp932_to_utf8(STRING)cp932_to_utf8(CODEREF, STRING)-
Converts CP-932 to UTF-8.
For example, converts
\x81\xe0or\x87\x90toU+2252in the UTF-8 encoding.If
CODEREFis not specified, characters that aren't mapped to Unicode are deleted. IfCODEREFis specified, characters that aren't mapped to Unicode are converted usingCODEREFfrom the CP-932 character string.For example, converts
\x82\xf2toU+3094, HIRAGANA LETTER VU, in the UTF-8 encoding.cp932_to_utf8( sub { $_[0] eq "\x82\xf2" ? "\xe3\x82\x94" : "" }, $cp932_string); cp932_to_utf16(STRING)cp932_to_utf16(CODEREF, STRING)-
Converts CP-932 to UTF-16LE.
For example, converts
\x81\xe0or\x87\x90toU+2252in the UTF-16LE encoding.If
CODEREFis not specified, characters that aren't mapped to Unicode are deleted. IfCODEREFis specified, characters that aren't mapped to Unicode are converted usingCODEREFfrom the CP-932 character string.For example, converts
\x82\xf2toU+3094, HIRAGANA LETTER VU, in the UTF-16LE encoding.cp932_to_utf16( sub { $_[0] eq "\x82\xf2" ? "\x94\x30" : "" }, $cp932_string); utf8_to_cp932(STRING)utf8_to_cp932(CODEREF, STRING)-
Converts UTF-8 to CP-932 (normalized).
For example,
U+2252in the UTF-8 encoding is converted to\x81\xe0, not to\x87\x90.If
CODEREFis not specified, characters that aren't mapped to CP-932 are deleted. IfCODEREFis specified, characters that aren't mapped to CP-932 are converted usingCODEREFfrom its Unicode code point (integer).For example, characters that aren't mapped to CP-932 are converted to numerical character reference for HTML 4.01.
utf8_to_cp932(sub {sprintf "&#%04x;", shift}, $utf8_string); utf16_to_cp932(STRING)utf16_to_cp932(CODEREF, STRING)-
Converts
UTF-16LEtoCP-932(normalized).For example,
U+2252in theUTF-16LEencoding is converted to\x81\xe0, not to\x87\x90.If
CODEREFis not specified, characters that aren't mapped to CP-932 are deleted. IfCODEREFis specified, characters that aren't mapped to CP-932 are converted usingCODEREFfrom its Unicode code point (integer).For example, characters that aren't mapped to CP-932 are converted to numerical character reference for HTML 4.01.
utf16_to_cp932(sub {sprintf "&#%04x;", shift}, $utf16LE_string);
AUTHOR
Tomoyuki SADAHIRO
bqw10602@nifty.com
http://homepage1.nifty.com/nomenclator/perl/
Copyright(C) 2001, SADAHIRO Tomoyuki. Japan. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
Microsoft PRB: Conversion Problem Between Shift-JIS and Unicode (Article ID: Q170559)
Unicode.org
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT