NAME

Unicode::Transform - conversion among Unicode Transformation Formats (UTFs)

SYNOPSIS

use Unicode::Transform;

$unicode_string = utf16be_to_unicode($utf16be_string);
$utf16le_string = unicode_to_utf16le($unicode_string);

DESCRIPTION

This module provides some functions to convert a string among some Unicode Transformation Formats (UTFs).

conversion from UTF to Perl internal's Unicode format

STRING is the source string.

If CODEREF is omitted, any partial octets are deleted.

If CODEREF is specified, the appearance of a partial octet calls it with an argument the value of which is an integer of its octet code point, and the return value of that is inserted.

(You can call die or croak in CODEREF if you want to trap an ill-formed source.)

utf16le_to_unicode([CODEREF,] STRING)

Converts UTF-16LE to Unicode (Perl internal's Unicode format).

utf16be_to_unicode([CODEREF,] STRING)

Converts UTF-16BE to Unicode.

utf32le_to_unicode([CODEREF,] STRING)

Converts UTF-32LE to Unicode.

utf32be_to_unicode([CODEREF,] STRING)

Converts UTF-32BE to Unicode.

utf8_to_unicode([CODEREF,] STRING)

Converts UTF-8 to Unicode.

utf8mod_to_unicode([CODEREF,] STRING)

Converts UTF-8-Mod to Unicode.

utfcp1047_to_unicode([CODEREF,] STRING)

Converts UTF-EBCDIC (for CP1047) to Unicode.

conversion from Perl Internal's Unicode format to UTF

STRING is the source string.

If CODEREF is omitted, any UTF-illegal characters (high and low surrogate characters, and code points over 0x10FFFF) are deleted.

If CODEREF is specified, the appearance of a UTF-illegal character calls it with an argument the value of which is an integer of its Unicode code point, and the return value of that is inserted.

unicode_to_utf16le([CODEREF,] STRING)

Converts UTF-16LE to Unicode.

unicode_to_utf16be([CODEREF,] STRING)

Converts UTF-16BE to Unicode.

unicode_to_utf32le([CODEREF,] STRING)

Converts UTF-32LE to Unicode.

unicode_to_utf32be([CODEREF,] STRING)

Converts UTF-32BE to Unicode.

unicode_to_utf8([CODEREF,] STRING)

Converts UTF-8 to Unicode.

unicode_to_utf8mod([CODEREF,] STRING)

Converts UTF-8-Mod to Unicode.

unicode_to_utfcp1047([CODEREF,] STRING)

Converts UTF-EBCDIC (for CP1047) to Unicode.

AUTHOR

SADAHIRO Tomoyuki, <SADAHIRO@cpan.org>

http://homepage1.nifty.com/nomenclator/perl/

Copyright(C) 2002-2003, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perlunicode
http://www.unicode.org/reports/tr16

UTF-EBCDIC and UTF-8-Mod