NAME

HTML::Entities::ImodePictogram - encode / decode i-mode pictogram

SYNOPSIS

use HTML::Entities::ImodePictogram;

$html      = encode_pictogram($rawtext);
$rawtext   = decode_pictogram($html);
$cleantext = remove_pictogram($rawtext);

use HTML::Entities::ImodePictogram qw(find_pictogram);

$num_found = find_pictogram($rawtext, \&callback);

DESCRIPTION

HTML::Entities::ImodePictogram handles HTML entities for i-mode pictogram (emoji), which are assigned in Shift_JIS private area.

See http://www.nttdocomo.co.jp/i/tag/emoji/index.html for details about i-mode pictogram.

FUNCTIONS

In all functions in this module, input/output strings are asssumed as encoded in Shift_JIS. See Jcode for conversion between Shift_JIS and other encodings like EUC-JP or UTF-8.

This module exports following functions by default.

encode_pictogram
$html = encode_pictogram($rawtext);

Encodes pictogram characters in raw-text into HTML entities.

decode_pictogram
$rawtext = decode_pictogram($html);

Decodes HTML entities for pictogram into raw-text.

remove_pictogram
$cleantext = remove_pictogram($rawtext);

Removes pictogram characters in raw-text.

This module also exports following functions on demand.

find_pictogram
$num_found = find_pictorgram($rawtext, \&callback);

Finds pictogram characters in raw-text and executes callback when found. It returns the total numbers of charcters found in text.

The callback is given two arguments. The first is a found pictogram character itself, and the second is a decimal number which represents codepoint of the character. Whatever the callback returns will replace the original text.

Here is an implementation of encode_pictogram(), which will be the good example for the usage of find_pictogram().

  sub encode_pictogram {
      my $text = shift;
      find_pictogram($text, sub {
			 my($char, $number) = @_;
			 return '&#' . $number . ';';
		     });
      return $text;
  }

CAVEAT

This module works so slow, because regex used here matches ANY characters in the text. This is due to the difficulty of extracting character boundaries of Shift_JIS encoding.

AUTHOR

Tatsuhiko Miyagawa <miyagawa@bulknews.net>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

HTML::Entities, http://www.nttdocomo.co.jp/i/tag/emoji/index.html