NAME

SPVM::Unicode - SPVM Unicode Utilities.

SYNOPSYS

use Unicode;

# Get Unicode codepoints from UTF-8 string with the byte offset and proceed the offset to next UTF-8 character position
my $string = "あいうえお";
my $pos = 0;
while ((my $uchar = Unicode->uchar($string, \$pos)) >= 0) {
  # ...
}

DESCRIPTION

Unicode is SPVM Unicode utilities. This module privides the methods to convert UTF-8 bytes to/from Unicode codepoints.

CLASS METHODS

ERROR_INVALID_UTF8

static method INVALID_UTF8 : int ();

return -2. this means uchar function find invalid utf8.

is_unicode_scalar_value

static method is_unicode_scalar_value : int ($code_point: int) {

Check if the given value is a Unicode scalar values.

The range of Unicode scalar values are the range of Unicode code points(0 to 0x10FFFF) except for the range of surrogate code points(0xD800 to 0xDFFF).

uchar

static method uchar : int ($string : string, $offset_ref : int*);

Get a Unicode codepoint from UTF-8 string with the byte offset and proceed the offset to next UTF-8 character position.

If offset is over the string length, this method returns -1.

If invalid UTF-8 character is found, this method returns -2. This is the same value of the return value of ERROR_INVALID_UTF8 method.

uchar_to_utf8

static method uchar_to_utf8 : string ($unicode_code_point : int);

Convert a Unicode codepoint to a UTF-8 character.

If the argument value is invalid Unicode code point, this method returns undef.

utf8_to_utf16

static method utf8_to_utf16 : short[] ($utf8_string : string) {

Convert a UTF-8 string to a UTF-16 string.

utf16_to_utf8

static method utf16_to_utf8 : string ($utf16_string : short[]) {

Convert a UTF-16 string to a UTF-8 string.

utf32_to_utf16

static method utf32_to_utf16 : short[] ($utf32_string : int[]);

Convert a UTF-32 string to a UTF-16 string.

utf16_to_utf32

static method utf16_to_utf32 : int[] ($utf16_string : short[]);

Convert a UTF-16 string to UTF-32 string.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 14:

Non-ASCII character seen before =encoding in '"あいうえお";'. Assuming UTF-8