NAME

Unicode::ICU::Collator - wrapper around ICU collation services

SYNOPSIS

use Unicode::ICU::Collator;
my $coll = Unicode::ICU::Collator->new($locale);

# name of the locale actually selected
print $coll->getLocale;

# sort according to locale
my @sorted = $coll->sort(@unsorted);

# comparisons
my @sorted = sort {
  $coll->cmp($a->name, $b->name)
} @unsorted;

# build sort keys
my @sorted = map $_->[1],
  sort { $a->[0] cmp $b->[0] }
    map [ $coll->getSortKey($_->name), $_ ], @unsorted;

# get the display name of a collation locale
print Unicode::ICU::Collator->getDisplayName("de__phonebook", "en");
# German (PHONEBOOK)
print Unicode::ICU::Collator->getDisplayName("de__phonebook", "de");
# Deutsch (PHONEBOOK)

DESCRIPTION

Unicode::ICU::Collator is a thin (and currently incomplete) wrapper around ICU's collation functions.

CLASS METHODS

new($locale)

Create a new collation object for the specified locale.

my $coll = Unicode::ICU::Collator->new("en");
my $coll_de = Unicode::ICU::Collator->new("de_phonebook");
available()

Return a list of the available collation locale names.

my @locales = Unicode::ICU::Collator->available;
getDisplayName($locale, $display_locale)

Return a descriptive name of the locale $locale for display in locale $display_locale.

# probably "English"
my $en_en = Unicode::ICU::Collator->getDisplayName("en", "en");
# "German"
my $de_en = Unicode::ICU::Collator->getDisplayName("de", "en");
# "Deutsch"
my $de_de = Unicode::ICU::Collator->getDisplayName("de", "de");
# "Deutsch (PHONEBOOK)"
my $deph_de = Unicode::ICU::Collator->getDisplayName("de__phonebook", "de");

INSTANCE METHODS

cmp($str1, $str2)

Compare two strings per the collation selected, returning -1, 0, or 1 as per perl's cmp.

my $cmp = $coll->cmp($str1, $str2);
my @sorted = sort { $coll->cmp($a, $b) } @unsorted;
eq($str1, $str2)
ne($str1, $str2)
lt($str1, $str2)
gt($str1, $str2)
le($str1, $str2)
ge($str1, $str2)

Compare the strings lexically within the collation, returning true or false.

getSortKey($str)

Returns a binary string suitable for use with perl's built-in string comparison operators such as cmp, for comparing the source strings.

my @sorted = map $_->[1],
  sort { $a->[0] cmp $b->[0] }
    map [ $coll->getSortKey($_->name), $_ ], @unsorted;
sort(@list)

Return the contents of @list (which can be any list, not just an array) sorted per the collation.

Currently this is a simply perl code wrapper around getSortKey() but that may change.

my @sorted = $coll->sort(@unsorted);
getLocale()
getLocale($type)

Return the locale used as the source of the collation, the most specific collation name known or the collation name supplied to new, depending on $type.

$type is one of the following constants, as exported by the :locale export tag:

  • ULOC_ACTUAL_LOCALE - the actual locale being used. eg. if you supply "en_US" to new, this will probably return "en". If $type is not provided, this is the default.

  • ULOC_VALID_LOCALE - the most specific locale supported by ICU.

  • ULOC_REQUESTED_LOCALE - the locale name supplied to new().

my $name = $coll->getLocale();
use Unicode::ICU::Collator ':locale';
my $name = $coll->getLocale(ULOC_VALID_LOCALE());
setAttribute($attr, $value)

Set an attribute for the collation.

Constants for $attr and $value are exported by the :attributes tag.

Please see the documentation of UColAttribute type in the ICU documentation for details.

$coll->setAttribute(UCOL_NUMERIC_COLLATION(), UCOL_ON());
getAttribute($attr)

Return the value of a collation attribute.

my $value = $coll->getAttribute(UCOL_NUMERIC_COLLATION());
getRules()
getRules($type)

Retrieve the collation rules used by this collator.

Note: this is typically a long string for UCOL_FULL_RULES, and probably isn't very useful.

Values for $type are:

  • UCOL_FULL_RULES - the full set of rules for the collation. This is the default.

  • UCOL_TAILORING_ONLY - only the rule tailoring.

getVersion()

Return version information for the collator as a dotted decimal string.

getUCAVersion()

Return the UCA version information for a collator.

LICENSE

Unicode::ICU::Collator is licensed under the same terms as Perl itself.

SEE ALSO

http://site.icu-project.org/

http://userguide.icu-project.org/collation

http://icu-project.org/apiref/icu4c/ucol_8h.html

Unicode::Collate

AUTHOR

Tony Cook <tonyc@cpan.org>