NAME

Unicode::Map8 - Mapping table between 8-bit chars and Unicode

SYNOPSIS

require Unicode::Map8;
my $no_map = Unicode::Map8->new("ISO646-NO") || die;
my $l1_map = Unicode::Map8->new("latin1")    || die;

my $ustr = $no_map->to16("V}re norske tegn b|r {res");
my $lstr = $l1_map->to8($ustr);
print "$lstr\n";

DESCRIPTION

The Unicode::Map8 class implement efficient mapping tables between 8-bit character sets and 16 bit character sets like Unicode. The 16-bit strings is assumed to use network byte order.

The following methods are available:

$m = Unicode::Map8->new( [$charset] )

The object constructor creates new instances of the Unicode::Map8 class. I takes an optional argument that specify then name of a 8-bit character set to initialize from. The argument can also be a the name of a mapping file. If the charset/file can not be located, then the constructor returns undef.

If you omit the argument, then an empty mapping table is constructed. You must then add mapping pairs to it using the addpair() method described below.

$m->addpair( $u8, $u16 );

Adds a new mapping pair to the mapping object. It takes two arguments. The first is the code value in the 8-bit character set and the second is the corresponding code value in the 16-bit character set. The same codes can be used multiple times (but not the same pair). The first definition for a code is the one that is used.

Consider the following example:

$m->addpair(0x20, 0x0020);
$m->addpair(0x20, 0x00A0);
$m->addpair(0xA0, 0x00A0);

It means that the character 0x20 and 0xA0 in the 8-bit charset maps to themselves in the 16-bit set, but in the 16-bit character set 0x0A0 maps to 0x20.

$m->default_to8( $u8 )

Set the code of default character to use when mapping from 16-bit to 8-bit strings. If there is no mapping pair defined for a character then this default is used by to8() and recode8().

$m->default_to16( $u16 )

Set the code of default character to use when mapping from 8-bit to 16-bit strings. If there is no mapping pair defined for a character then this default is used by to16(), tou() and recode8().

$m->nostrict;

All undefined mappings are replaced with the identity mapping. Undefined character are normally just zapped when converting between character sets.

$m->to8( $ustr );

Converts a 16-bit character string to the corresponding string in the 8-bit character set.

$m->to16( $str );

Converts a 8-bit character string to the corresponding string in the 16-bit character set.

$m->tou( $str );

Same an to16() but return a Unicode::String object instead of a plain UCS2 string.

$m->recode8($m2, $str);

Map the string $str from one 8-bit character set ($m) to another one ($m2). Since we know the mappings towards the common 16-bit encoding we can use this to convert between any of the 8-bit character sets we know about.

$m->to_char16( $u8 )

Maps an 8-bit character code to an 16-bit code.

$m->to_char8( $u16 )

Maps a 16-bit character code to an 8-bit code.

$m->fprint( FILE );

If the extension is compiled with the -DDEBUGGING option, then this method is available. It prints a summary of the content of the mapping table on the specified file handle.

BUGS

Does not handle Unicode surrogate pairs as a single character.

COPYRIGHT

Copyright 1998 Gisle Aas.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.