NAME

Unicode::Collate::Locale - Linguistic tailoring for DUCET via Unicode::Collate

SYNOPSIS

use Unicode::Collate::Locale;

$Collator = Unicode::Collate::Locale->
    new(locale => $locale_name, %tailoring);

@sorted = $Collator->sort(@not_sorted);

DESCRIPTION

This module provides linguistic tailoring for it taking advantage of Unicode::Collate.

Constructor

The new method returns a collator object.

A parameter list for the constructor is a hash, which can include a special key 'locale' and its value (case-insensitive) standing for a two-letter language code (ISO-639) like 'en' for English. For example, Unicode::Collate::Locale->new(locale => 'FR') returns a collator tailored for French.

$locale_name may be suffixed with a territory(country) code or a variant code, which are separated with '_'. E.g. en_US for English in USA, es_ES_traditional for Spanish in Spain (Traditional),

If $localename is not defined, fallback is selected in the following order:

1. language_territory_variant
2. language_territory
3. language__variant
4. language
5. default

Tailoring tags provided by Unicode::Collate are allowed as long as they are not used for 'locale' support. Esp. the table tag is always untailorable since it is reserved for DUCET.

E.g. a collator for French, which ignores diacritics and case difference (i.e. level 1), with reversed case ordering and no normalization.

Unicode::Collate::Locale->new(
    level => 1,
    locale => 'fr',
    upper_before_lower => 1,
    normalization => undef
)

Methods

Unicode::Collate::Locale is a subclass of Unicode::Collate and methods other than new are inherited from Unicode::Collate.

Here is a list of additional methods:

$Collator->getlocale

Returns a language code accepted and used actually on collation. If linguistic tailoring is not provided for a language code you passed (intensionally for some languages, or due to the incomplete implementation), this method returns a string 'default' meaning no special tailoring.

A list of tailorable locales

  locale name       description
----------------------------------------------------------
  af                Afrikaans
  az                Azerbaijani (Azeri)
  ca                Catalan
  cs                Czech
  cy                Welsh
  da                Danish
  de__phonebook     German (umlaut as 'ae', 'oe', 'ue')
  eo                Esperanto
  es                Spanish
  es__traditional   Spanish ('ch' and 'll' as a grapheme)
  et                Estonian
  fi                Finnish
  fil               Filipino
  fo                Faroese
  fr                French
  ha                Hausa
  haw               Hawaiian
  is                Icelandic
  kl                Kalaallisut
  lt                Lithuanian
  lv                Latvian
  mt                Maltese
  nb                Norwegian Bokmal
  nn                Norwegian Nynorsk
  nso               Northern Sotho
  om                Oromo
  pl                Polish
  ro                Romanian
  sk                Slovak
  sl                Slovenian
  sv                Swedish
  sw                Swahili
  tn                Tswana
  tr                Turkish
  vi                Vietnamese
  wo                Wolof
  yo                Yoruba

INSTALL

Installation of Unicode::Collate::Locale requires Collate/Locale.pm, Collate/Locale/*.pm and Collate/allkeys.txt. On building, Unicode::Collate::Locale doesn't require data/*.txt and mklocale. Tests for Unicode::Collate::Locale are named t/loc_*.t.

CAVEAT

tailoring is not maximum

If a certain letter is tailored, its equivalents are not always tailored as well as it. For example, even though W is tailored, fullwidth W (U+FF37), W with acute (U+1E82), etc. are not tailored. Thus the result may depend on whether source strings are normalized or not.

AUTHOR

The Unicode::Collate::Locale module for perl was written by SADAHIRO Tomoyuki, <SADAHIRO@cpan.org>. This module is Copyright(C) 2004-2010, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

Unicode Collation Algorithm - UTS #10

http://www.unicode.org/reports/tr10/

The Default Unicode Collation Element Table (DUCET)

http://www.unicode.org/Public/UCA/latest/allkeys.txt

CLDR - Unicode Common Locale Data Repository

http://cldr.unicode.org/

Unicode::Collate
Unicode::Normalize