NAME
Encode::JP::Mobile - Shift_JIS (CP932) variants of Japanese cellphone pictograms
SYNOPSIS
use Encode;
use Encode::JP::Mobile;
my $bytes = "\x82\xb1\xf9\x5d\xf8\xa0\x82\xb1"; # Shift_JIS bytes containing NTT DoCoMo pictograms
my $chars = decode("x-sjis-imode", $bytes); # \x{3053}\x{e6b9}\x{e63f}\x{3053}
use Encode::JP::Mobile ':props';
if ($chars =~ /\p{InDoCoMoPictograms}/) {
warn "It has DoCoMo pictogram characters!";
}
DESCRIPTION
Encode::JP::Mobile is an Encode module to support Shift_JIS (CP032) extended characters mapped in Unicode Private Area.
This module is EXPERIMENTAL. That means API and implementations will sometimge be backward incompatible.
ENCODINGS
This module currently supports the following encodings.
- x-sjis-imode
-
Mapping for NTT DoCoMo i-mode handsets. Pictograms are mapped in Shift_JIS private area and Unicode private area. The conversion rule is equivalent to that of cp932.
For example,
U+E64E
is Fine character (or The Sun) and is encoded as\xF8\x9F
in this encoding.This encoding is a subset of cp932 encoding, but has a reverse mapping from KDDI/AU Unicode private area characters to DoCoMo pictogram encodings. For example,
my $kddi = "\xf6\x59"; # [!] in KDDI/AU my $char = decode("x-sjis-kddi", $bytes); # \x{E481} my $imode = encode("x-sjis-imode", $char); # \xf9\xdc -- [!] in DoCoMo
x-sjis-docomo is an alias.
- x-sjis-softbank
-
Escape sequence based Shift_JIS encoding for Softbank pictograms. Decoding algorithm is not based on an ucm file, but a perl code.
x-sjis-vodafone is an alias.
For example,
U+E001
is A Boy character and is encoded as\x1b$G!\x0f
in this encoding (\x1b$G
is the beginning of escape sequence and\x0f
is the end.) - x-sjis-softbank-auto
-
Maps Unicode private area characters to Shift_JIS private area (Gaiji) characters. This encoding is used in 3GC phones when you input pictogram charaters in a web form on Shift_JIS pages and submit. Handsets also can decode these encodings and display pictogram characters.
x-sjis-vodafone-auto is an alias.
The private area mapping seems similar to CP932 but with a bit of offset.
For example, +E001 is A Boy character (same as x-sjis-softbank) and is encoded as \xF9\x41.
- x-sjis-kddi
-
Mapping for KDDI/AU pictograms. It's based on cp932 (I guess) but there are more private characters that are not included in CP932.TXT.
For example, U+E481 is ! (the exclamation) character and is encoded as \xF6\x59 (same as cp932). U+EB88 is Angry character and is encoded in \xF4\x8D while cp932 doesn't have a map for it.
x-sjis-ezweb is an alias.
- x-sjis-kddi-auto
-
Mapping for KDDI/AU pictograms, based on handset's internal Shift_JIS to UTF-8 translations and vice verca. When you input some pictogram characters in a web form on a UTF-8 page and submit them, this mapping is used (instead of CP932 based x-sjis-kddi) to represent the pictogram characters.
x-sjis-kddi-auto and x-sjis-kddi shares Unicode to encoding mapping each other and hence round-trip safe, which means:
my $bytes = "\xf6\x59"; # [!] in KDDI/AU decode("x-sjis-kddi", $bytes); # \x{E481} decode("x-sjis-kddi-auto", $bytes); # \x{EF59} encode("x-sjis-kddi", "\x{EF59}"); # same as $bytes encode("x-sjis-kddi-auto", "\x{E481}"); # same as $bytes
x-sjis-ezweb-auto
is an alias. - x-iso-2022-jp-kddi
-
Encoding used to encode KDDI/AU pictogram characters in Email. It's based on iso-2022-jp which is still a de-facto standard encoding when we sned emails.
Actually most KDDI/AU cellphones can receive emails encoded in Shift_JIS, so you can just use x-sjis-kddi to encode the pictogram characters. This encoding might be still needed to decode incoming emails sent from KDDI/AU phones containing pictogram characters.
x-iso-2022-jp-ezweb
is an alias. - x-sjis-airedge
-
Mapping for AirEDGE pictograms. It's a complete subset of cp932
x-sjis-airh
is an alias.
UNICODE PROPERTIES
By importing this module with ':props' flag, you'll have following Unicode properties.
- InDoCoMoPictograms
- InKDDIPictograms
- InSoftBankPictograms
- InAirEdgePictograms
Note that if the input is one of x-sjis-* variants, first you need to know what encoding the bytes are encoded, and decode the bytes back to Unicode, to know if the strings contain these pictogram character sets. So it might be only handy if the input is UTF-8 in reality.
BACKWARD COMPATIBLITY
As of 0.07, this module now uses x-sjis-* as its encoding names. It still supports the old shift_jis-* aliases though. I'm planning to deprecate them sometime in the future release.
NOTES
Pictogram characters are defined to be round-trip safe. However, they use Unicode Private Area for such characters, that means you'll have interoperability issues, which this module doesn't try yet to solve completely. We have a partial support for roundtrip (automatic conversion) between x-sjis-imode and x-sjis-kddi.
As of version 0.04, this module tries to do auto-conversion of KDDI/AU and NTT-DoCoMo pictogram characters. Supporting Softbank characters are still left TODO.
TODO
Implement all merged
x-sjis-mobile-jp
encoding.
AUTHORS
Tatsuhiko Miyagawa <miyagawa@bulknews.net>
This library is free software, licensed under the same terms with Perl.
SEE ALSO
Encode, HTML::Entities::ImodePictogram, Unicode::Japanese
http://www.nttdocomo.co.jp/service/imode/make/content/pictograph/basic/ http://www.nttdocomo.co.jp/service/imode/make/content/pictograph/extention/ http://www.au.kddi.com/ezfactory/tec/spec/3.html http://developers.softbankmobile.co.jp/dp/tool_dl/web/picword_top.php http://www.willcom-inc.com/ja/service/contents_service/club_air_edge/for_phone/homepage/index.html http://www.nttdocomo.co.jp/service/mail/imode_mail/emoji_convert/
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 160:
Deleting unknown formatting code U<>