=head1 NAME (¼O)
ShiftJIS::CP932::MapUTF - Microsoft CP-932ÆUnicodeÆÌÏ·
=head1 Tv
use ShiftJIS::CP932::MapUTF qw(:all);
$utf8_string = cp932_to_utf8($cp932_string);
$cp932_string = utf8_to_cp932($utf8_string);
=head1 à¾
}CN\tgEBhEY (Microsoft Windows) R[hy[W 932 (CP-932)
Ìe[uÍ 7915 ¶©çÈèÜ·B
JIS X 0201 êoCg¶i191 ¶j
JIS X 0208 ñoCg¶i6879 ¶j
NECÁê¶i83 ¶A13æj
NECIèIBMg£¶i374 ¶A89`92æj
IBMg£¶i388 ¶A115`119æj
±Ì\ÍAϷūȢñdè`¶ðÜñŢܷB
±êçÌñdè`¶Íx_[iNEC ¨æÑ IBMjè`Ìg£¶Ì½ßÅ·B
á¦ÎAUnicode Ì C<U+2252> ÉÎt¯çêé¶Íñ èÜ·B
ÂÜèAJIS X 0208 ¶Ì C<0x81e0> Æ NEC Áê¶Ì C<0x8790> Å·B
ÀÛACP-932 Ì 7915 ¶ð Unicode Ì 7517 ¶ÉÎt¯È¯êÎÈèܹñB
±Ì½ßA398 ÌϷūȢÎÖWª¶ÝµÜ·B
±ÌW
[ÍACP-932 ©ç Unicode ÉAܽA
Unicode ©ç CP-932 ÉAKØÉÏ··éÖðñµÜ·B
=head2 CP-932 ©ç Unicode ÖÌÏ·
æêøªt@XÌêA»êÍ C<SJIS_CALLBACK> ƵÄA
Unicode ÖÌΪȢ CP-932 ¶ÌÉp¢çêÜ·B
(C<STRING> Ét@Xð^¦é±ÆÍūܹñB)
C<SJIS_CALLBACK> ª^¦çêÄ¢éêA
æñøª C<STRING> ƵÄp¢çêÜ·B
³àȯêÎæêøª C<STRING> ÉÈèÜ·B
ൠC<SJIS_CALLBACK> ª^¦çêĢȢêA
Unicode ÖÌΪȢ CP-932 ¶ÍÙÁÄí³êA
ª¶ÍêoCgªµÎ³êÜ·B
C<SJIS_CALLBACK> ƵÄAíÉó¶ñðÔ·
R[ht@X (C<sub {''}>) ªn³ê½©Ìæ¤É®ìµÜ·B
¡ÌƱëAC<SJIS_CALLBACK> ƵÄÍA
R[ht@XÌݪg¦Ü·B
R[ht@XÌÔèlª}bsOÌÈ¢¶ÌãíèÉ}ü³êÜ·B
R[ht@X C<SJIS_CALLBACK> ÍAêÂÈãÌøÆÆàÉ
ÄÑo³êÜ·B}bsOÌÈ¢¶ªªIÈñoCg¶
iæêoCgÌÝÌêoCg·Ì¶ñjÌêA
æêøÍ¢è`liC<undef>jÉÈèA
æñøÍoCgð\·Èµ®lÉÈèÜ·B
ª¶ÅȯêÎAæêøÍA¶ð\·¶ñÉÈèÜ·B
ftHgÅÍAªIÈñoCg¶ÍA¶ñiC<STRING>jÌöÉÌÝ
»êéÂ\«ª èA¶ñÌæªârÉÍ»êܹñ
iC<SJIS_OPTION> Ì C<'t'> àQÆÌ±ÆjB
á
my $sjis_callback = sub {
my ($char, $byte) = @_;
return function($char) if defined $char;
die sprintf "found partial byte 0x%02x", $byte;
};
ãLÌáÅAC<$char> ƵÄÍAC<"\x80">, C<"\x82\xf2">, C<"\xfc\xfc">,
C<"\xff"> ÈǪ è¾Ü·B
C<SJIS_CALLBACK> ÌÔèlÍAÏ·æÌ`®Éí¹È¯êÎÈèܹñB
á¦ÎAC<cp932_to_utf16be()> ÆÆàÉ UTF-8 ðÔ·
C<SJIS_CALLBACK> ðgÁÄÍ¢¯Ü¹ñB
ÂÜèAUTF ²ÆÉAC<SJIS_CALLBACK> ðpÓ·éKvª èÜ·B
C<SJIS_OPTION> ð C<STRING> Ìãɨ±ÆªÅ«Ü·B
±êçÍ C<'tg'> â C<'gst'> Ìæ¤ÉgÝí¹é±Æà
ūܷiÍCÓÅ·jB
'g' CP-932 Oi[Uè`¶j[0xF040`0xF9FC (95`114æ)] ð
Unicode Ì PUA [0xE000`0xE757] ÉÏ·µÜ·i1880 ¶jB
's' CP-932 ¢è`ÌêoCg¶ðÈºÌæ¤ÉÏ·µÜ·B
0x80 => U+0080, 0xA0 => U+F8F0,
0xFD => U+F8F1, 0xFE => U+F8F2, 0xFF => U+F8F3.
't' æñoCgÌÍÍ [0x40..0x7E, 0x80..0xFC] ð`FbNµÜ·B
á¦Î "\x81\x39" ÍftHgÅÍ¢è`ÌñoCg¶Æ
ÝȵܷªA't' ðp¢éÆAª¶oCg 0x81 ÌãÉ
êoCg¶ "\x39" ª±¢½àÌÆÝȵܷB
=over 4
=item C<cp932_to_utf8([SJIS_CALLBACK,] STRING [, SJIS_OPTION])>
CP-932 ð UTF-8 ÉÏ·µÜ·B
=item C<cp932_to_unicode([SJIS_CALLBACK,] STRING [, SJIS_OPTION])>
CP-932 ð Unicode ÉÏ·µÜ·B
iC<SVf_UTF8> tOt«Ì PerlÌà`®, F<perlunicode> ðQÆBj
B<±ÌÖÍ Perl 5.6.1 È~A©Â XS ÅÅÌÝñ³êÜ·B>
=item C<cp932_to_utf16le([SJIS_CALLBACK,] STRING [, SJIS_OPTION])>
CP-932 ð UTF-16LE ÉÏ·µÜ·B
=item C<cp932_to_utf16be([SJIS_CALLBACK,] STRING [, SJIS_OPTION])>
CP-932 ð UTF-16BE ÉÏ·µÜ·B
=item C<cp932_to_utf32le([SJIS_CALLBACK,] STRING [, SJIS_OPTION])>
CP-932 ð UTF-32LE ÉÏ·µÜ·B
=item C<cp932_to_utf32be([SJIS_CALLBACK,] STRING [, SJIS_OPTION])>
CP-932 ð UTF-32BE ÉÏ·µÜ·B
=back
=head2 Unicode ©ç CP-932 ÖÌÏ·
ñdè`¶Í·×ÄAMicrosoft PRB Q170559 É]ÁÄÏ·³êÜ·B
á¦Î C<U+2252> Í C<"\x87\x90"> ÅÍÈ C<"\x81\xE0"> ÉÏ·³êÜ·B
æêøªt@XÌêA»êÍ C<UNICODE_CALLBACK> ƵÄA
CP-932 ÖÌΪȢ Unicode ¶ÌÉp¢çêÜ·B
(C<STRING> Ét@Xð^¦é±ÆÍūܹñB)
C<UNICODE_CALLBACK> ª^¦çêÄ¢éêA
æñøª C<STRING> ƵÄp¢çêÜ·B
³àȯêÎæêøª C<STRING> ÉÈèÜ·B
ൠC<UNICODE_CALLBACK> ª^¦çêĢȢêA
CP-932 ÖÌΪȢ Unicode ¶ÍÙÁÄí³êA
ܽAª¶ÍêoCgªµÎ³êÜ·B
C<UNICODE_CALLBACK> ƵÄAíÉó¶ñðÔ·
R[ht@X (C<sub {''}>) ªn³ê½©Ìæ¤É®ìµÜ·B
¡ÌƱëAC<UNICODE_CALLBACK> ƵÄÍA
R[ht@XÌݪg¦Ü·B
»ÌR[ht@XÌÔèlª
}bsOÌÈ¢¶ÌãíèÉ}ü³êÜ·B
R[ht@X C<UNICODE_CALLBACK> ÍA
êÂÈãÌøÆÆàÉÄÑo³êÜ·B}bsOÌÈ¢¶ª
ªI¶is³ÈoCgjÌêAæêøÍ¢è`liC<undef>jÉÈèA
æñøÍoCgð\·Èµ®lÉÈèÜ·B
ª¶ÅȯêÎAæêøÍAUnicode¶ÌÊuð\·
ȵ®lÉÈèÜ·B
á¦ÎACP-932 ÖÌΪȢ¶ð HTML 4.01 Ìl¶QÆÉ
Ï··éû@ð¦µÜ·B
sub toHexNCR {
my ($char, $byte) = @_;
return sprintf("&#x%x;", $char) if defined $char;
die sprintf "illegal byte 0x%02x was found", $byte;
}
$cp932 = utf8_to_cp932 (\&toHexNCR, $utf8_string);
$cp932 = unicode_to_cp932(\&toHexNCR, $unicode_string);
$cp932 = utf16le_to_cp932(\&toHexNCR, $utf16le_string);
C<UNICODE_CALLBACK> ÌÔèlÍ CP-932 Ƶijµ éKvª èÜ·B
C<UNICODE_OPTION> ð C<STRING> Ìãɨ±ÆªÅ«Ü·B
±êçÍ C<'fg'> â C<'gsf'> Ìæ¤ÉgÝí¹é±Æà
ūܷiÍCÓÅ·jB
'g' CP-932 Oi[Uè`¶j[0xF040`0xF9FC (95`114æ)] É
Unicode Ì PUA [0xE000`0xE757] ©çÏ·µÜ·i1880 ¶jB
's' CP-932 ¢è`ÌêoCg¶ÌÎt¯ðÇÁµÜ·B
U+0080 => 0x80, U+F8F0 => 0xA0,
U+F8F1 => 0xFD, U+F8F2 => 0xFE, U+F8F3 => 0xFF.
'f' Unicode ©ç CP-932 ÖÌô©ÌãpIÈÏ· (fallbacks) ð
ÇÁµÜ·B}bsOªÇÁ³êé¶ÍAlatin-1 Ìæ
[U+00A0..U+00FF] ̤¿Ìô©̶ÆA½¼¼Ì [U+3094,
м¼Ì (0x8394) ÉÈèÜ·] Å·B
=over 4
=item C<utf8_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
UTF-8 ð CP-932 ÉÏ·µÜ·B
=item C<unicode_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Unicode ð CP-932 ÉÏ·µÜ·B
±Ì B<Unicode> ÍAPerl Ìà`®iF<perlunicode> QÆjB
C<SVf_UTF8> tOt«ÅÈ¢êAISO 8859-1 (latin1) ¶ñƵÄ
Unicode É upgrade ³êÜ·B
B<±ÌÖÍ Perl 5.6.1 È~A©Â XS ÅÅÌÝñ³êÜ·B>
=item C<utf16_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
UTF-16 (C<BOM> t«Ü½Í³µ) ð CP-932 ÉÏ·µÜ·B
=item C<utf16le_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
UTF-16LE ð CP-932 ÉÏ·µÜ·B
=item C<utf16be_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
UTF-16BE ð CP-932 ÉÏ·µÜ·B
=item C<utf32_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
UTF-32 (C<BOM> t«Ü½Í³µ) ð CP-932 ÉÏ·µÜ·B
=item C<utf32le_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
UTF-32LE ð CP-932 ÉÏ·µÜ·B
=item C<utf32be_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
UTF-32BE ð CP-932 ÉÏ·µÜ·B
=back
=head2 Ao
B<ftHg:>
cp932_to_utf8 utf8_to_cp932
cp932_to_utf16le utf16le_to_cp932
cp932_to_utf16be utf16be_to_cp932
cp932_to_unicode unicode_to_cp932 (XS ÌÝÅñ³êÜ·)
B<v³êêÎ:>
cp932_to_utf32le utf32le_to_cp932
cp932_to_utf32be utf32be_to_cp932
utf16_to_cp932 [*]
utf32_to_cp932 [*]
[*] ±êçÆÎ·×« C<cp932_to_utf16()> ¨æÑ C<cp932_to_utf32()>
Í¢ÀÅ·Bܾ C<SJIS_CALLBACK> ÌÔèlÉ¢Äव¢ª
KvÆl¦Ä¢Ü·B
i¶ñÌAÉ C<BOM> ÌF¯ÆªKvÆÈéŵå¤Bj
=head1 Ó
±ÌW
[Ì Pure Perl ÅÍCh¶iF<perlunicode> ðQÆjð
ðūܹñBKvÈçAPerl 5.7 È~Ì
C<utf8::decode>/C<utf8::encode>iF<utf8> ðQÆjðgÁľ³¢B
=head1 ìÒ
SADAHIRO Tomoyuki <SADAHIRO@cpan.org> iåA msj
Copyright(C) 2001-2006, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.
=head1 Ql¿
=over 4
=item Microsoft PRB, Article ID: Q170559
Conversion Problem Between Shift-JIS and Unicode
=item cp932 to Unicode table
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit932.txt
http://www.microsoft.com/globaldev/reference/dbcs/932.htm
=back
=cut