NAME

Encode::JP::Mobile - Shift_JIS (CP932) variants of Japanese cellphone pictograms

SYNOPSIS

use Encode;
use Encode::JP::Mobile;

my $bytes = "\x82\xb1\xf9\x5d\xf8\xa0\x82\xb1"; # Shift_JIS bytes containing NTT DoCoMo pictograms
my $chars = decode("x-sjis-imode", $bytes);     # \x{3053}\x{e6b9}\x{e63f}\x{3053}

use Encode::JP::Mobile ':props';
if ($chars =~ /\p{InDoCoMoPictograms}/) {
    warn "It has DoCoMo pictogram characters!";
}

DESCRIPTION

Encode::JP::Mobile is an Encode module to support Shift_JIS (CP032) extended characters mapped in Unicode Private Area.

This module is EXPERIMENTAL. That means API and implementations will sometimge be backward incompatible.

ENCODINGS

This module currently supports the following encodings.

x-sjis-imode

Mapping for NTT DoCoMo i-mode handsets. Pictograms are mapped in Shift_JIS private area and Unicode private area. The conversion rule is equivalent to that of cp932.

For example, U+E64E is Fine character (or The Sun) and is encoded as \xF8\x9F in this encoding.

This encoding is a subset of cp932 encoding, but has a reverse mapping from KDDI/AU Unicode private area characters to DoCoMo pictogram encodings. For example,

my $kddi  = "\xf6\x59"; # [!] in KDDI/AU
my $char  = decode("x-sjis-kddi", $bytes); # \x{E481}
my $imode = encode("x-sjis-imode", $char); # \xf9\xdc -- [!] in DoCoMo

x-sjis-docomo is an alias.

x-sjis-softbank

Escape sequence based Shift_JIS encoding for Softbank pictograms. Decoding algorithm is not based on an ucm file, but a perl code.

x-sjis-vodafone is an alias.

For example, U+E001 is A Boy character and is encoded as \x1b$G!\x0f in this encoding (\x1b$G is the beginning of escape sequence and \x0f is the end.)

x-sjis-softbank-auto

Maps Unicode private area characters to Shift_JIS private area (Gaiji) characters. This encoding is used in 3GC phones when you input pictogram charaters in a web form on Shift_JIS pages and submit. Handsets also can decode these encodings and display pictogram characters.

x-sjis-vodafone-auto is an alias.

The private area mapping seems similar to CP932 but with a bit of offset.

For example, +E001 is A Boy character (same as x-sjis-softbank) and is encoded as \xF9\x41.

x-sjis-kddi

Mapping for KDDI/AU pictograms. It's based on cp932 (I guess) but there are more private characters that are not included in CP932.TXT.

For example, U+E481 is ! (the exclamation) character and is encoded as \xF6\x59 (same as cp932). U+EB88 is Angry character and is encoded in \xF4\x8D while cp932 doesn't have a map for it.

x-sjis-ezweb is an alias.

x-sjis-kddi-auto

Mapping for KDDI/AU pictograms, based on handset's internal Shift_JIS to UTF-8 translations and vice verca. When you input some pictogram characters in a web form on a UTF-8 page and submit them, this mapping is used (instead of CP932 based x-sjis-kddi) to represent the pictogram characters.

x-sjis-kddi-auto and x-sjis-kddi shares Unicode to encoding mapping each other and hence round-trip safe, which means:

my $bytes = "\xf6\x59";                 # [!] in KDDI/AU
decode("x-sjis-kddi", $bytes);          # \x{E481}
decode("x-sjis-kddi-auto", $bytes);     # \x{EF59}
encode("x-sjis-kddi", "\x{EF59}");      # same as $bytes
encode("x-sjis-kddi-auto", "\x{E481}"); # same as $bytes

x-sjis-ezweb-auto is an alias.

x-iso-2022-jp-kddi

Encoding used to encode KDDI/AU pictogram characters in Email. It's based on iso-2022-jp which is still a de-facto standard encoding when we sned emails.

Actually most KDDI/AU cellphones can receive emails encoded in Shift_JIS, so you can just use x-sjis-kddi to encode the pictogram characters. This encoding might be still needed to decode incoming emails sent from KDDI/AU phones containing pictogram characters.

x-iso-2022-jp-ezweb is an alias.

x-sjis-airedge

Mapping for AirEDGE pictograms. It's a complete subset of cp932x-sjis-airh is an alias.

UNICODE PROPERTIES

By importing this module with ':props' flag, you'll have following Unicode properties.

InDoCoMoPictograms
InKDDIPictograms
InSoftBankPictograms
InAirEdgePictograms

Note that if the input is one of x-sjis-* variants, first you need to know what encoding the bytes are encoded, and decode the bytes back to Unicode, to know if the strings contain these pictogram character sets. So it might be only handy if the input is UTF-8 in reality.

BACKWARD COMPATIBLITY

As of 0.07, this module now uses x-sjis-* as its encoding names. It still supports the old shift_jis-* aliases though. I'm planning to deprecate them sometime in the future release.

NOTES

Pictogram characters are defined to be round-trip safe. However, they use Unicode Private Area for such characters, that means you'll have interoperability issues, which this module doesn't try yet to solve completely. We have a partial support for roundtrip (automatic conversion) between x-sjis-imode and x-sjis-kddi.
As of version 0.04, this module tries to do auto-conversion of KDDI/AU and NTT-DoCoMo pictogram characters. Supporting Softbank characters are still left TODO.

TODO

Implement all merged x-sjis-mobile-jp encoding.

AUTHORS

Tatsuhiko Miyagawa <miyagawa@bulknews.net>

This library is free software, licensed under the same terms with Perl.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)