NAME
ShiftJIS::X0213::MapUTF - conversion between Shift_JISX0213 and Unicode
SYNOPSIS
use ShiftJIS::X0213::MapUTF;
$unicode_string = sjis0213_to_unicode($sjis0213_string);
$sjis0213_string = unicode_to_sjis0213($unicode_string);
DESCRIPTION
This module provides some functions to map from Shift_JISX0213 to Unicode, and vice versa.
sjis0213_to_unicode(STRING)
sjis0213_to_unicode(CODEREF, STRING)
-
Converts Shift_JISX0213 to Unicode (UTF-8/UTF-EBCDIC as a Unicode-oriented perl knows).
Characters unmapped to Unicode are deleted, if
CODEREF
is not specified; otherwise, converted using theCODEREF
from the Shift_JISX0213 character string. sjis0213_to_utf16be(STRING)
sjis0213_to_utf16be(CODEREF, STRING)
-
Converts Shift_JISX0213 to UTF-16BE.
sjis0213_to_utf16le(STRING)
sjis0213_to_utf16le(CODEREF, STRING)
-
Converts Shift_JISX0213 to UTF-16LE.
Characters unmapped to Unicode are deleted, if
CODEREF
is not specified; otherwise, converted using theCODEREF
from the Shift_JISX0213 character string. unicode_to_sjis0213(STRING)
unicode_to_sjis0213(CODEREF, STRING)
-
Converts Unicode (UTF-8/UTF-EBCDIC as a Unicode-oriented perl knows) to Shift_JISX0213.
Characters unmapped to Shift_JISX0213 are deleted, if
CODEREF
is not specified; otherwise, converted using theCODEREF
from its Unicode codepoint (integer).For example, characters unmapped to Shift_JISX0213 are converted to numerical character references for HTML 4.01.
unicode_to_sjis0213(sub {sprintf "&#x%04x;", shift}, $unicode_string);
utf16be_to_sjis0213(STRING)
utf16be_to_sjis0213(CODEREF, STRING)
-
Converts UTF-16BE to Shift_JISX0213.
utf16le_to_sjis0213(STRING)
utf16le_to_sjis0213(CODEREF, STRING)
-
Converts UTF-16LE to Shift_JISX0213.
Characters unmapped to Shift_JISX0213 are deleted, if
CODEREF
is not specified; otherwise, converted using theCODEREF
from its Unicode codepoint (integer).For example, characters unmapped to Shift_JISX0213 are converted to numerical character references for HTML 4.01.
utf16le_to_sjis0213(sub {sprintf "&#x%04x;", shift}, $utf16LE_string);
BUGS
On mapping between Shift_JISX0213 and Unicode used in this module, notice that:
If an authentic mapping would have been published, the mapping by this module will be corrected according to that mapping.
0xFC5A in Shift_JISX0213 is mapped to U+9B1D according to JIS X 0213:2000, while Unicode's Unihan.txt maps it to U+9B1C.
0x81D4 and 0x81D5 in Shift_JISX0213 is mapped in the block of
Halfwidth and Fullwidth Forms
, not in the block ofMiscellaneous Mathematical Symbols-B
, according to Shibano's JIS KANJI JITEN, published in June, 2002.The following 25 JIS Non-Kanji characters are not included in Unicode 3.2.0. So they are mapped to each 2 characters in Unicode. These mappings are done round-trippedly for *one Shift_JISX0213 character*. Then round-trippedness for a Shift_JISX0213 *string* is broken. (E.g. Shift_JISX0213 <0x8663> and <0x857B, 0x867B> both are mapped to <U+00E6, U+0300>, and <U+00E6, U+0300> is mapped only to SJIS <0x8663>.)
SJIS0213 Unicode 3.2.0 # Name by JIS X 0213:2000 0x82F5 <U+304B, U+309A> # [HIRAGANA LETTER BIDAKUON NGA] 0x82F6 <U+304D, U+309A> # [HIRAGANA LETTER BIDAKUON NGI] 0x82F7 <U+304F, U+309A> # [HIRAGANA LETTER BIDAKUON NGU] 0x82F8 <U+3051, U+309A> # [HIRAGANA LETTER BIDAKUON NGE] 0x82F9 <U+3053, U+309A> # [HIRAGANA LETTER BIDAKUON NGO] 0x8397 <U+30AB, U+309A> # [KATAKANA LETTER BIDAKUON NGA] 0x8398 <U+30AD, U+309A> # [KATAKANA LETTER BIDAKUON NGI] 0x8399 <U+30AF, U+309A> # [KATAKANA LETTER BIDAKUON NGU] 0x839A <U+30B1, U+309A> # [KATAKANA LETTER BIDAKUON NGE] 0x839B <U+30B3, U+309A> # [KATAKANA LETTER BIDAKUON NGO] 0x839C <U+30BB, U+309A> # [KATAKANA LETTER AINU CE] 0x839D <U+30C4, U+309A> # [KATAKANA LETTER AINU TU(TU)] 0x839E <U+30C8, U+309A> # [KATAKANA LETTER AINU TO(TU)] 0x83F6 <U+31F7, U+309A> # [KATAKANA LETTER AINU P] 0x8663 <U+00E6, U+0300> # [LATIN SMALL LETTER AE WITH GRAVE] 0x8667 <U+0254, U+0300> # [LATIN SMALL LETTER OPEN O WITH GRAVE] 0x8668 <U+0254, U+0301> # [LATIN SMALL LETTER OPEN O WITH ACUTE] 0x8669 <U+028C, U+0300> # [LATIN SMALL LETTER TURNED V WITH GRAVE] 0x866A <U+028C, U+0301> # [LATIN SMALL LETTER TURNED V WITH ACUTE] 0x866B <U+0259, U+0300> # [LATIN SMALL LETTER SCHWA WITH GRAVE] 0x866C <U+0259, U+0301> # [LATIN SMALL LETTER SCHWA WITH ACUTE] 0x866D <U+025A, U+0300> # [LATIN SMALL LETTER HOOKED SCHWA WITH GRAVE] 0x866E <U+025A, U+0301> # [LATIN SMALL LETTER HOOKED SCHWA WITH ACUTE] 0x8685 <U+02E9, U+02E5> # [RISING SYMBOL] 0x8686 <U+02E5, U+02E9> # [FALLING SYMBOL]
AUTHOR
Tomoyuki SADAHIRO
bqw10602@nifty.com
http://homepage1.nifty.com/nomenclator/perl/
Copyright(C) 2002-2002, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
- JIS X 0213:2000
-
7-bit and 8-bit double byte coded extended KANJI sets for information interchange (by JIS Committee)
- JIS KANJI JITEN, the revised edition.
-
edited by Shibano, published by Japanese Standards Association (JSA), 2002, Tokyo [ISBN4-542-20129-5]
- http://www.jsa.or.jp/
-
Japanese Standards Association (access to JIS)
- http://www.unicode.org/Public/UNIDATA/Unihan.txt
-
Unihan database (Unicode version: 3.2.0) by Unicode (c).
- http://homepage1.nifty.com/nomenclator/unicode/sjis0213.zip
-
A mapping table between Shift_JISX0213 and Unicode 3.2.0.
(This table is prepared by me, and with no authority; but through the table, you will know what is to be done by this module.)
- ShiftJIS::CP932::MapUTF
-
conversion between Microsoft Windows CP-932 and Unicode
(The CP932-Unicode mapping is different with the Shift_JISX0213-Unicode mapping, but what you desire may be the former.)