NAME
ShiftJIS::CP932::Correct - Corrects a string in the CP-932 encoding (Shift_JIS supported by MS).
SYNOPSIS
use ShiftJIS::CP932::Correct;
$corrected_cp932 = correct_cp932($cp932_string);
DESCRIPTION
The Microsoft Code Page 932 (CP-932) table comprises 7915 characters:
JIS X 0201-1976 single-byte characters (191 characters),
JIS X 0208-1990 double-byte characters (6879 characters),
NEC special characters (83 characters from SJIS row 13),
NEC-selected IBM extended characters (374 characters from SJIS row 89 to 92),
and IBM extended characters (388 characters from SJIS row 115 to 119).
It contains duplicates that do not round trip map. These duplicates are due to the characters defined by vendors, NEC and IBM.
For example, there are two characters mapped to U+2252, namely, 0x81e0 (JIS X 0208) and 0x8790 (NEC special character).
So some programs converting Unicode to CP-932 may carelessly convert U+2252 to 0x8790, but not to 0x81e0.
Such a behavior is disagreeable since NEC special characters (or other vendor-defined characters) are less compatible.
This module corrects (or normalizes) such a (certainly legal but) 'wrong' CP-932 string.
This modules uses a map provided in Microsoft PRB: Conversion Problem Between Shift-JIS and Unicode (Article ID: Q170559).
correct_cp932(STRING)
-
Corrects a CP-932 string. namely, converts less preferred codepoints of duplicates (doubly-defined characters) to those preferred.
Does not affect characters that can be round trip mapped to Unicode. Any undefined characters are deleted.
For example, converts
\x87\x90
to\x81\xe0
.
AUTHOR
Tomoyuki SADAHIRO
bqw10602@nifty.com
http://homepage1.nifty.com/nomenclator/perl/
Copyright(C) 2001, SADAHIRO Tomoyuki. Japan. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
Microsoft PRB: Conversion Problem Between Shift-JIS and Unicode (Article ID: Q170559)