NAME
Locale::Unicode - Unicode Locale Identifier compliant with BCP47 and CLDR
SYNOPSIS
use Locale::Unicode;
my $locale = Locale::Unicode->new( 'ja-Kana-t-it' ) ||
die( Locale::Unicode->error, "\n" );
say $locale; # ja-Kana-t-it
# Some undefined locale in Cyrillic script
my $locale = Locale::Unicode->new( 'und-Cyrl' );
$locale->transform( 'und-latn' );
$locale->mechanism( 'ungegn-2007' );
say $locale; # und-Cyrl-t-und-latn-m0-ungegn-2007
# A locale in Cyrillic, transformed from Latin, according to a UNGEGN specification dated 2007.
VERSION
v0.1.0
DESCRIPTION
This module implements the Unicode LDML (Locale Data Markup Language) extensions
It does not enforce the standard, and is merely an API to construct, access and modify locales. It is your responsibility to set the right values.
For your convenience, summary of key elements of the standard can be found in this documentation.
It is lightweight and fast with no dependency outside of Scalar::Util and Want. It requires perl v5.10 minimum to operate.
The objects stringifies, and once its string value is computed, it is cached and re-used until it is changed. Thus repetitive call to as_string or to stringification does not incur any speed penalty by recomputing what has not changed.
CONSTRUCTOR
new
my $locale = Locale::Unicode->new( 'en' );
my $locale = Locale::Unicode->new( 'en-GB' );
my $locale = Locale::Unicode->new( 'en-Latn-AU' );
my $locale = Locale::Unicode->new( 'he-IL-u-ca-hebrew-tz-jeruslm' );
my $locale = Locale::Unicode->new( 'ja-Kana-t-it' );
my $locale = Locale::Unicode->new( 'und-Latn-t-und-cyrl' );
my $locale = Locale::Unicode->new( 'und-Cyrl-t-und-latn-m0-ungegn-2007' );
my $locale = Locale::Unicode->new( 'de-u-co-phonebk-ka-shifted' );
# Machine translated from German to Japanese using an undefined vendor
my $locale = Locale::Unicode->new( 'ja-t-de-t0-und' );
$locale->script( 'Kana' );
$locale->country_code( 'JP' );
# Now: ja-Kana-JP-t-de-t0-und
This takes a locale as compliant with the BCP47 standard, and an optional hash or hash reference of options and this returns a new object.
The locale provided is parsed and its components can be accessed and modified using all the methods of this class API.
If an hash or hash reference of options are provided, it will be used to set or modify the components from the locale provided.
If an error occurs, an exception object is set and undef is returned in scalar context, or an empty list in list context. The exception object can then be retrieved using error, such as:
my $locale = Locale::Unicode->new( $somthing_bad ) ||
die( Locale::Unicode->error );
METHODS
All the methods below are context sensitive.
If they are called in an object context, they will return the current Locale::Unicode object for chaining, otherwise, they will return the current value. And if that value is undef, it will return undef in scalar context, but an empty list in list context.
Also, if an error occurs, it will set an exception object and returns undef in scalar context, or an empty list in list context.
apply
my $hash_reference = Locale::Unicode->parse( 'ja-Kana-t-it' );
$locale->apply( $hash_reference );
Provided with an hash reference of key-value pairs, and this will set each corresponding method with the associated value.
If a property provided has no corresponding method, it emits a warning if warnings are enabled
It returns the current object upon success, or sets an error object upon error and returns undef in scalar context, or an empty list in list context.
as_string
Returns the Locale object as a string, based on its latest attributes set.
The string value returned is computed only once and further call to as_string returns a cached value unless changes were made to the Locale attributes.
break_exclusion
my $locale = Locale::Unicode->new( 'ja' );
$locale->break_exclusion( 'hani-hira-kata' );
# Now: ja-dx-hani-hira-kata
This is a Unicode Dictionary Break Exclusion Identifier that specifies scripts to be excluded from dictionary-based text break (for words and lines).
Sets or gets the Unicode extension dx
See also dx
This specifies scripts to be excluded from dictionary-based text break.
ca
This is an alias for "calendar"
calendar
my $locale = Locale::Unicode->new( 'th' );
$locale->calendar( 'buddhist' );
# or:
# $locale->ca( 'buddhist' );
# Now: th-u-ca-buddhist
# which is the Thai with Buddist calendar
Sets or gets the Unicode extension ca, which is a calendar identifier.
See the section on "BCP47 EXTENSIONS" for the proper values.
cf
This is an alias for "cu_format"
co
my $locale = Locale::Unicode->new( 'de' );
$locale->collation( 'phonebk' );
$locale->ka( 'shifted' );
# Now: de-u-co-phonebk-ka-shifted
This is a Unicode collation identifier that specifies a type of collation (sort order).
This is an alias for "collation"
colAlternate
my $locale = Locale::Unicode->new( 'de' );
$locale->collation( 'phonebk' );
$locale->ka( 'shifted' );
# Now: de-u-co-phonebk-ka-shifted
$locale->collation( 'noignore' );
# or similarly:
$locale->collation( 'non-ignorable' );
Sets alternate handling for variable weights.
Sets or gets the Unicode extension ka
See "Collation Options" for more information.
colBackwards
$locale->colBackwards(1); # true
# Now: kb-true
$locale->colBackwards(0); # false
# Now: kb-false
Sets collation boolean value for backward collation weight.
Sets or gets the Unicode extension kb
See "Collation Options" for more information.
colCaseFirst
Sets or gets the Unicode extension kf
colCaseLevel
$locale->colCaseLevel(1); # true
# Now: kc-true
$locale->colCaseLevel(0); # false
# Now: kc-false
Sets collation boolean value for case level.
Sets or gets the Unicode extension kc
See "Collation Options" for more information.
colHiraganaQuaternary
$locale->colHiraganaQuaternary(1); # true
# Now: kh-true
$locale->colHiraganaQuaternary(0); # false
# Now: kh-false
Sets collation parameter key for special Hiragana handling.
Sets or gets the Unicode extension kh
See "Collation Options" for more information.
collation
my $locale = Locale::Unicode->new( 'fr' );
$locale->collation( 'emoji' );
# Now: fr-u-co-emoji
my $locale = Locale::Unicode->new( 'de' );
$locale->collation( 'phonebk' );
# Now: de-u-co-phonebk
# which is: German using Phonebook sorting
Sets or gets the Unicode extension co
This specifies a type of collation (sort order).
See "Unicode extensions" for possible values and more information on standard.
See also "Collation Options" for more on collation options.
colNormalisation
This is an alias for colNormalization
colNormalization
$locale->colNormalization(1); # true
# Now: kk-true
$locale->colNormalization(0); # false
# Now: kk-false
Sets collation parameter key for normalisation.
Sets or gets the Unicode extension kk
See "Collation Options" for more information.
colNumeric
$locale->colNumeric(1); # true
# Now: kn-true
$locale->colNumeric(0); # false
# Now: kn-false
Sets collation parameter key for numeric handling.
Sets or gets the Unicode extension kn
See "Collation Options" for more information.
colReorder
my $locale = Locale::Unicode->new( 'en' );
$locale->colReorder( 'latn-digit' );
# Now: en-u-kr-latn-digit
# Reorder digits after Latin characters.
my $locale = Locale::Unicode->new( 'en' );
$locale->colReorder( 'arab-cyrl-others-symbol' );
# Now: en-u-kr-arab-cyrl-others-symbol
# Reorder Arabic characters first, then Cyrillic, and put
# symbols at the end—after all other characters.
Sets collation reorder codes.
Sets or gets the Unicode extension kr
See "Collation Options" for more information.
shiftedGroup
This is an alias for "colValue"
colStrength
$locale->colStrength( 'level1' );
# Now: ks-level1
# or, equivalent:
$locale->colStrength( 'primary' );
$locale->colStrength( 'level2' );
# or, equivalent:
$locale->colStrength( 'secondary' );
$locale->colStrength( 'level3' );
# or, equivalent:
$locale->colStrength( 'tertiary' );
$locale->colStrength( 'level4' );
# or, equivalent:
$locale->colStrength( 'quaternary' );
$locale->colStrength( 'quarternary' );
$locale->colStrength( 'identic' );
$locale->colStrength( 'identic' );
$locale->colStrength( 'identical' );
Sets the collation parameter key for collation strength used for comparison.
Sets or gets the Unicode extension ks
See "Collation Options" for more information.
colValue
$locale->colValue( 'currency' );
$locale->colValue( 'punct' );
$locale->colValue( 'space' );
$locale->colValue( 'symbol' );
Sets the collation value for the last reordering group to be affected by ka-shifted.
Sets or gets the Unicode extension kv
See "Collation Options" for more information.
colVariableTop
Sets the string value for the variable top.
Sets or gets the Unicode extension vt
See "Collation Options" for more information.
country_code
my $locale = Locale::Unicode->new( 'en' );
$locale->country_code( 'US' );
# Now: en-US
$locale->country_code( 'GB' );
# Now: en-GB
Sets or gets the country code part of the locale.
A country code should be an ISO 3166 2-letters code, but keep in mind that the LDML (Locale Data Markup Language) accepts old data to ensure stability.
cu
my $locale = Locale::Unicode->new( 'ja' );
$locale->cu( 'jpy' );
# Now: ja-u-cu-jpy
# which is the Japanese Yens
This is a Unicode currency identifier that specifies a type of currency (ISO 4217 code.
This is an alias for "currency"
cu_format
# Using minus sign symbol for negative numbers
$locale->cf( 'standard' );
# Using parentheses for negative numbers
$locale->cf( 'account' );
This is a currency format identifier such as standard or account
Sets or gets the Unicode extension cf
See the section on "BCP47 EXTENSIONS" for the proper values.
currency
my $locale = Locale::Unicode->new( 'ja' );
$locale->currency( 'jpy' );
# or
# $locale->cu( 'jpy' );
# Now: ja-u-cu-jpy
# which is the Japanese yens
Sets or gets the Unicode extension cu
This specifies a type of ISO4217 currency code.
d0
This is an alias for "destination"
dest
This is an alias for "destination"
destination
Sets or gets the Transformation extension d0 for destination.
See the section on "Transform extensions" for more information.
dx
This is an alias for "break_exclusion"
em
This is an alias for "emoji"
emoji
This is a Unicode Emoji Presentation Style Identifier that specifies a request for the preferred emoji presentation style.
Sets or gets the Unicode extension em.
false
This is read-only and returns a Locale::Unicode::Boolean object representing a false value.
fw
This is an alias for "first_day"
first_day
This is a Unicode First Day Identifier that specifies the preferred first day of the week for calendar display.
Sets or gets the Unicode extension fw.
Its values are sun, mon, etc... sat
h0
This is an alias for "hybrid"
hc
This is an alias for "hour_cycle"
hour_cycle
This is a Unicode Hour Cycle Identifier that specifies the preferred time cycle.
Sets or gets the Unicode extension hc.
hybrid
my $locale = Locale::Unicode->new( 'ru' );
$locale->transform( 'en' );
$locale->hybrid(1); # true
# or
# $locale->hybrid( 'hybrid' );
# or
# $locale->h0( 'hybrid' );
# Now: ru-t-en-h0-hybrid
# Hybrid Cyrillic - Runglish
my $locale = Locale::Unicode->new( 'en' );
$locale->transform( 'zh-hant' );
$locale->hybrid( 'hybrid' );
# Now: en-t-zh-hant-h0-hybrid
# which is Hybrid Latin - Chinglish
Those are Hybrid Locale Identifiers indicating that the t value is a language that is mixed into the main language tag to form a hybrid.
Sets or gets the Transformation extension h0.
See the section on "Transform extensions" for more information.
i0
This is an alias for "input"
k0
This is an alias for "keyboard"
input
my $locale = Locale::Unicode->new( 'zh' );
$locale->input( 'pinyin' );
# Now: zh-t-i0-pinyin
This is an Input Method Engine transformation.
Sets or gets the Transformation extension i0.
See the section on "Transform extensions" for more information.
ka
This is an alias for "colAlternate"
kb
This is an alias for "colBackwards"
kc
This is an alias for "colCaseLevel"
keyboard
my $locale = Locale::Unicode->new( 'en' );
$locale->keyboard( 'dvorak' );
# Now: en-t-k0-dvorak
This is a keyboard transformation, such as used by client-side virtual keyboards.
Sets or gets the Transformation extension k0.
See the section on "Transform extensions" for more information.
kf
This is an alias for "colCaseFirst"
kh
This is an alias for "colHiraganaQuaternary"
kk
This is an alias for "colNormalization"
kn
This is an alias for "colNumeric"
kr
This is an alias for "colReorder"
ks
This is an alias for "colStrength"
kv
This is an alias for "colValue"
lang
# current value: fr-FR
$obj->lang( 'de' );
# Now: de-FR
Sets or gets the locale part of this Local object.
See also "locale"
lb
This is an alias for "line_break"
line_break
This is a Unicode Line Break Style Identifier that specifies a preferred line break style corresponding to the CSS level 3 line-break option.
Sets or gets the Unicode extension lb.
line_break_word
This is a Unicode Line Break Word Identifier that specifies a preferred line break word handling behavior corresponding to the CSS level 3 word-break option
Sets or gets the Unicode extension lw.
locale
This is an alias for "lang"
locale3
my $locale = Locale::Unicode->new( 'jpn' );
$locale->script( 'Kana' );
# Now: jpn-Kana
Sets or gets the 3-letter ISO 639-2 code. Keep in mind, however, that to ensure stability, the LDML (Locale Data Markup Language) also uses old data.
lw
This is an alias for "line_break_word"
m0
This is an alias for "mechanism"
machine
my $locale = Locale::Unicode->new( 'ja' );
$locale->transform( 'de' );
$locale->machine( 'und' );
# Now: ja-t-de-t0-und
# Japanese translated from Germany by an undefined vendor
This is used to indicate content that has been machine translated, or a request for a particular type of machine translation of content.
Sets or gets the Transformation extension t0.
See the section on "Transform extensions" for more information.
measurement
This is a Unicode Measurement System Identifier that specifies a preferred measurement system.
Sets or gets the Unicode extension ms.
mechanism
my $locale = Locale::Unicode->new( 'und-Latn' );
$locale->transform( 'ru' );
$locale->mechanism( 'ungegn-2007' );
# Now: und-Latn-t-ru-m0-ungegn-2007
# representing a transformation from United Nations Group of Experts on
# Geographical Names in 2007
This is a transformation mechanism referencing an authority or rules for a type of transformation.
Sets or gets the Transformation extension m0.
See the section on "Transform extensions" for more information.
ms
This is an alias for "measurement"
mu
This is an alias for "unit"
nu
This is an alias for "number"
number
This is a Unicode Number System Identifier that specifies a type of number system.
Sets or gets the Unicode extension nu.
private
my $locale = Locale::Unicode->new( 'ja-JP' );
$locale->private( 'something-else' );
# Now: ja-JP-x-something-else
This serves to set or get the value for a private subtag.
region
# current value: fr-FR
$locale->region( 'DE' );
# Now: fr-DE
Sets or gets the region part of a Unicode locale.
This is normally an ISO3166-1 country code.
region_override
my $locale = Locale::Unicode->new( 'en-GB' );
$locale->region_override( 'uszzzz' );
# Now: en-GB-u-rg-uszzzz
# which is a locale for British English but with region-specific defaults set to US.
This is a Unicode Region Override that specifies an alternate region to use for obtaining certain region-specific default values.
Sets or gets the Unicode extension rg.
reset
When provided with any argument, this will reset the cached value computed by "as_string"
rg
This is an alias for "region_override"
s0
This is an alias for "source"
script
# current value: zh-Hans
$locale->script( 'Hant' );
# Now: zh-Hant
Sets or gets the script part of the Locale identifier.
sd
This is an alias for "subdivision"
sentence_break
This is a Unicode Sentence Break Suppressions Identifier that specifies a set of data to be used for suppressing certain sentence breaks.
Sets or gets the Unicode extension ss.
source
This is a transformation source for non-languages or scripts, such as fullwidth-halfwidth conversion.
Sets or gets the Transformation extension s0.
See the section on "Transform extensions" for more information.
ss
This is an alias for "sentence_break"
subdivision
my $locale = Locale::Unicode->new( 'gsw' );
$locale->subdivision( 'chzh' );
# or
# $locale->sd( 'chzh' );
# Now: gsw-u-sd-chzh
my $locale = Locale::Unicode->new( 'en-US' );
$locale->sd( 'usca' );
# Now: en-US-u-sd-usca
This is a Unicode Subdivision Identifier that specifies a regional subdivision used for locale. This is typically the States in the U.S., or prefectures in France or Japan, or provinces in Canada.
Sets or gets the Unicode extension sd.
Be careful of the rule in the standard. For example, en-CA-u-sd-gbsct would be invalid because gb in gbsct does not match the region subtag CA
t0
This is an alias for "machine"
t_private
my $locale = Locale::Unicode->new( 'ja' );
$locale->transform( 'und' );
$locale->t_private( 'medical' );
# Now: ja-t-de-t0-und-x0-medical
This is a private transformation subtag.
Sets or gets the Transformation private subtag x0.
t_x0
This is an alias for "t_private"
time_zone
This is a Unicode Timezone Identifier that specifies a time zone.
Sets or gets the Unicode extension tz.
timezone
This is an alias for "time_zone"
transform
my $locale = Locale::Unicode->new( 'ja' );
$locale->transform( 'it' );
# Now: ja-t-it
# which is Japanese, transformed from Italian
my $locale = Locale::Unicode->new( 'ja-Kana' );
$locale->transform( 'it' );
# Now: ja-Kana-t-it
# which is Japanese Katakana, transformed from Italian
# 'und' is undefined and is perfectly valid
my $locale = Locale::Unicode->new( 'und-Latn' );
$locale->transform( 'und-cyrl' );
# Now: und-Latn-t-und-cyrl
# which is Latin script, transformed from the Cyrillic script
Sets or gets the Transformation extension t.
transform_locale
my $locale = Locale::Unicode->new( 'ja' );
my $locale2 = Locale::Unicode->new( 'it' );
$locale->transform_locale( $locale2 );
# Now: ja-t-it
my $object = $locale->transform_locale;
Sets or gets a Locale::Unicode object used to indicate the original locale subject to transformation.
This will trigger an exception if a value, other than Locale::Unicode or an inheriting class object, is set.
See the section on "Transform extensions" for more information.
translation
Sets or gets the Transformation extension t0.
true
This is read-only and returns a Locale::Unicode::Boolean object representing a true value.
tz
This is an alias for "time_zone"
unit
This is a Measurement Unit Preference Override that specifies an override for measurement unit preference.
Sets or gets the Unicode extension mu.
va
This is an alias for "variant"
variant
This is a Unicode Variant Identifier that specifies a special variant used for locales.
Sets or gets the Unicode extension va.
vt
This is an alias for "colVariableTop"
CLASS FUNCTIONS
matches
Provided with a BCP47 locale, and this returns an hash reference of its components if it matches the BCP47 regular expression, which can be accessed as global class variable $LOCALE_RE.
If nothing matches, it returns an empty string in scalar context, or an empty list in list context.
If an error occurs, its sets an error object and returns undef in scalar context, or an empty list in list context.
parse
my $hash_ref = Locale::Unicode->parse( 'ja-Kana-t-it' );
# Transcription in Japanese Katakana of an Italian word:
# {
# ext_transform => "t-it",
# ext_transform_subtag => "it",
# locale => "ja",
# script => "Kana",
# }
my $hash_ref = Locale::Unicode->parse( 'he-IL-u-ca-hebrew-tz-jeruslm' );
# Represents Hebrew as spoken in Israel, using the traditional Hebrew calendar,
# and in the "Asia/Jerusalem" time zone
# {
# country_code => "IL",
# ext_unicode => "u-ca-hebrew-tz-jeruslm",
# ext_unicode_subtag => "ca-hebrew-tz-jeruslm",
# locale => "he",
# }
Provided with a BCP47 locale, and an optional hash reference like the one returned by matches, and this will return an hash reference with detailed broken down of the locale embedded information, as per the Unicode BCP47 standard.
tz_id2name
Provided with a CLDR timezone ID, such as jptyo for Asia/Tokyo, and this returns the IANA Olson name equivalent, which, in this case, would be Asia/Tokyo
If an error occurs, its sets an error object and returns undef in scalar context, or an empty list in list context.
tz_id2names
my $ref = Locale::Unicode->tz_id2names( 'unknown' );
# yields an empty array object
my $ref = Locale::Unicode->tz_id2names( 'jptyo' );
# Asia/Tokyo
Provided with a CLDR timezone ID, such as ausyd, which stands primarily for Australia/Sydney, and this returns an array object of IANA Olson timezone names, which, in this case, would yield: ['Australia/Sydney', 'Australia/ACT', 'Australia/Canberra', 'Australia/NSW']
The order is set by BCP47 timezone data
If an error occurs, its sets an error object and returns undef in scalar context, or an empty list in list context.
tz_info
my $def = Locale::Unicode->tz_id2names( 'jptyo' );
# yields the following hash reference:
# {
# alias => [qw( Asia/Tokyo Japan )],
# desc => "Tokyo, Japan",
# tz => "Asia/Tokyo",
# }
my $def = Locale::Unicode->tz_id2names( 'unknown' );
# yields an empty string (not undef)
Provided with a CLDR timezone ID, such as jptyo and this returns an hash reference representing the dictionary entry for that ID.
If no information exists for the given timezone ID, an empty string is returned. undef is returned only for errors.
If an error occurs, its sets an error object and returns undef in scalar context, or an empty list in list context.
tz_name2id
my $id = Locale::Unicode->tz_name2id( 'Asia/Tokyo' );
# jptyo
my $id = Locale::Unicode->tz_name2id( 'Australia/Canberra' );
# ausyd
Provided with an IANA Olson timezone name, such as Asia/Tokyo and this returns its CLDR equivalent, which, in this case, would be jptyo
If none exists, an empty string is returned.
If an error occurs, its sets an error object and returns undef in scalar context, or an empty list in list context.
OVERLOADING
Any object from this class is overloaded and stringifies to its locale representation.
For example:
my $locale = Locale::Unicode->new('ja-Kana-t-it' );
say $locale; # ja-Kana-t-it
$locale->transform( 'de' );
say $locale; # ja-Kana-t-de
BCP47 EXTENSIONS
Unicode extensions
Example:
gsw-u-sd-chzh
Known BCP47 language extensions as defined in RFC6067 are as follows:
-
caA Unicode calendar identifier that specifies a type of calendar used for formatting and parsing, such as date/time symbols and patterns; it also selects supplemental calendarData used for calendrical calculations. The value can affect the computation of the first day of the week.
For example:
-
ja-u-ca-japaneseJapanese Imperial calendar
-
th-u-ca-buddhistThai with Buddist calendar
Possible values are:
-
buddhistThai Buddhist calendar
-
chineseTraditional Chinese calendar
-
copticCoptic calendar
-
dangiTraditional Korean calendar
-
ethioaaEthiopic calendar, Amete Alem (epoch approx. 5493 B.C.E)
-
ethiopicEthiopic calendar, Amete Mihret (epoch approx, 8 C.E.)
-
gregoryGregorian calendar
-
hebrewTraditional Hebrew calendar
-
indianIndian calendar
-
islamicHijri calendar
-
islamic-civilHijri calendar, tabular (intercalary years [2,5,7,10,13,16,18,21,24,26,29] - civil epoch)
-
islamic-rgsaHijri calendar, Saudi Arabia sighting
-
islamic-tblaHijri calendar, tabular (intercalary years [2,5,7,10,13,16,18,21,24,26,29] - astronomical epoch)
-
islamic-umalquraHijri calendar, Umm al-Qura
-
islamiccCivil (algorithmic) Arabic calendar
-
iso8601ISO calendar (Gregorian calendar using the ISO 8601 calendar week rules)
-
japaneseJapanese Imperial calendar
-
persianPersian calendar
-
rocRepublic of China calendar
-
-
cfA Unicode currency format identifier
Typical values are:
-
standardDefault value. Negative numbers use the minusSign symbol.
-
accountNegative numbers use parentheses or equivalent.
-
-
coA Unicode collation identifier that specifies a type of collation (sort order).
Possible values are:
-
big5hanPinyin ordering for Latin, big5 charset ordering for CJK characters (used in Chinese)
-
compatA previous version of the ordering, for compatibility
-
dictDictionary style ordering (such as in Sinhala)
-
directBinary code point order (used in Hindi)
-
ducetThe default Unicode collation element table order
-
emojiRecommended ordering for emoji characters
-
eorEuropean ordering rules
-
gb2312Pinyin ordering for Latin, gb2312han charset ordering for CJK characters (used in Chinese)
-
phonebkPhonebook style ordering (such as in German)
-
phoneticPhonetic ordering (sorting based on pronunciation)
-
pinyinPinyin ordering for Latin and for CJK characters (used in Chinese)
-
reformedReformed ordering (such as in Swedish)
-
searchSpecial collation type for string search
-
searchjlSpecial collation type for Korean initial consonant search
-
standardDefault ordering for each language
-
strokePinyin ordering for Latin, stroke order for CJK characters (used in Chinese)
-
tradTraditional style ordering (such as in Spanish)
-
unihanPinyin ordering for Latin, Unihan radical-stroke ordering for CJK characters (used in Chinese)
-
zhuyinPinyin ordering for Latin, zhuyin order for Bopomofo and CJK characters (used in Chinese)
For example:
de-u-co-phonebk-ka-shifted(German using Phonebook sorting, ignore punct.) -
-
cuA Unicode Currency Identifier that specifies a type of currency (ISO 4217 code) consisting of 3 ASCII letters that are or have been valid in ISO 4217, plus certain additional codes that are or have been in common use.
For example:
ja-u-cu-jpy(Japanese yens) -
dxA Unicode Dictionary Break Exclusion Identifier specifies scripts to be excluded from dictionary-based text break (for words and lines).
A proper value is one or more Unicode script subtags separated by hyphen. Their order is not important, but canonical order is alphabetical, such as
dx-hani-thaiFor example:
dx-hani-hira-katadx-thai-hani
-
emA Unicode Emoji Presentation Style Identifier specifies a request for the preferred emoji presentation style.
Possible values are:
-
emojiUse an emoji presentation for emoji characters if possible.
-
textUse a text presentation for emoji characters if possible.
-
defaultUse the default presentation for emoji characters as specified in UTR #51
-
-
fwA Unicode First Day Identifier defines the preferred first day of the week for calendar display.
Possible values are:
-
sunSunday
-
monMonday
-
tueTuesday
-
wedWednesday
-
thuThursday
-
friFriday
-
satSaturday
-
-
hcA Unicode Hour Cycle Identifier defines the preferred time cycle.
Possible values are:
-
h12Hour system using 1–12; corresponds to
hin patterns -
h23Hour system using 0–23; corresponds to
Hin patterns -
h11Hour system using 0–11; corresponds to
Kin patterns -
h24Hour system using 1–24; corresponds to
kin pattern
-
-
lbA Unicode Line Break Style Identifier defines a preferred line break style corresponding to the CSS level 3 line-break option.
Possible values are:
-
strictCSS level 3 line-break=strict, e.g. treat CJ as NS
-
normalCSS level 3 line-break=normal, e.g. treat CJ as ID, break before hyphens for ja,zh
-
looseCSS lev 3 line-break=loose
-
-
lwA Unicode Line Break Word Identifier defines preferred line break word handling behavior corresponding to the CSS level 3 word-break option.
Possible values are:
-
normalCSS level 3 word-break=normal, normal script/language behavior for midword breaks
-
breakallCSS level 3 word-break=break-all, allow midword breaks unless forbidden by lb setting
-
keepallCSS level 3 word-break=keep-all, prohibit midword breaks except for dictionary breaks
-
phrasePrioritise keeping natural phrases (of multiple words) together when breaking, used in short text like title and headline
-
-
msA Unicode Measurement System Identifier defines a preferred measurement system. Specifying "ms" in a locale identifier overrides the default value specified by supplemental measurement system data for the region
Possible values are:
-
metricMetric System
-
ussystemUS System of measurement: feet, pints, etc.; pints are 16oz
-
uksystemUK System of measurement: feet, pints, etc.; pints are 20oz
-
-
muA Measurement Unit Preference Override defines an override for measurement unit preference.
Possible values are:
-
celsiusCelsius as temperature unit
-
kelvinKelvin as temperature unit
-
fahrenheFahrenheit as temperature unit
-
-
nuA Unicode Number System Identifier defines a type of number system.
For example:
ar-u-nu-native(Arabic with native digits such as "٠١٢٣٤"), orar-u-nu-latn(Arabic with Western digits such as "01234")Possible values are:
-
4-letters Unicode script subtag -
arabextExtended Arabic-Indic digits ("arab" means the base Arabic-Indic digits)
-
armnlowArmenian lowercase numerals
-
financeFinancial numerals
-
fullwideFull width digits
-
greklowGreek lower case numerals
-
hanidaysHan-character day-of-month numbering for lunar/other traditional calendars
-
hanidecPositional decimal system using Chinese number ideographs as digits
-
hansfinSimplified Chinese financial numerals
-
hantfinTraditional Chinese financial numerals
-
jpanfinJapanese financial numerals
-
jpanyearJapanese first-year Gannen numbering for Japanese calendar
-
lanathamTai Tham Tham (ecclesiastical) digits
-
mathboldMathematical bold digits
-
mathdblMathematical double-struck digits
-
mathmonoMathematical monospace digits
-
mathsanbMathematical sans-serif bold digits
-
mathsansMathematical sans-serif digits
-
mymrepkaMyanmar Eastern Pwo Karen digits
-
mymrpaoMyanmar Pao digits
-
mymrshanMyanmar Shan digits
-
mymrtlngMyanmar Tai Laing digits
-
nativeNative digits
-
outlinedLegacy computing outlined digits
-
romanRoman numerals
-
romanlowRoman lowercase numerals
-
segmentLegacy computing segmented digits
-
tamldecModern Tamil decimal digits
-
traditioTraditional numerals
-
-
rgA Region Override specifies an alternate region to use for obtaining certain region-specific default values
For example:
en-GB-u-rg-uszzzzrepresenting a locale for British English but with region-specific defaults set to US. -
sdA Unicode Subdivision Identifier defines a regional subdivision used for locales.
They are called various names, such as a state in the United States, or a prefecture in Japan or France, or a province in Canada.
For example:
-
en-u-sd-uszzzzSubdivision codes for unknown values are the region code plus
zzzz, such as here withuszzzzfor an unknown subdivision of the US. -
en-US-u-sd-uscaEnglish as used in California, USA
en-CA-u-sd-gbsctwould be invalid becausegbingbsctdoes not match the region subtagCA -
-
ssA Unicode Sentence Break Suppressions Identifier defines a set of data to be used for suppressing certain sentence breaks
Possible values are:
-
none(default)Do not use sentence break suppressions data
-
standardUse sentence break suppressions data of type
standard
-
-
tzA Unicode Timezone Identifier defines a timezone.
To access those values, check the class functions "tz_id2name", tz_id2names, "tz_info" and "tz_name2id"
Possible values are:
-
adalvName: Andorra
Time zone:
Europe/Andorra -
aedxbName: Dubai, United Arab Emirates
Time zone:
Asia/Dubai -
afkblName: Kabul, Afghanistan
Time zone:
Asia/Kabul -
aganuName: Antigua
Time zone:
America/Antigua -
aiaxaName: Anguilla
Time zone:
America/Anguilla -
altiaName: Tirane, Albania
Time zone:
Europe/Tirane -
amevnName: Yerevan, Armenia
Time zone:
Asia/Yerevan -
ancurName: Curaçao
Time zone:
America/Curacao -
aoladName: Luanda, Angola
Time zone:
Africa/Luanda -
aqamsAmundsen-Scott Station, South Pole
Deprecated. See instead
nzakl -
aqcasName: Casey Station, Bailey Peninsula
Time zone:
Antarctica/Casey -
aqdavName: Davis Station, Vestfold Hills
Time zone:
Antarctica/Davis -
aqdduName: Dumont d'Urville Station, Terre Adélie
Time zone:
Antarctica/DumontDUrville -
aqmawName: Mawson Station, Holme Bay
Time zone:
Antarctica/Mawson -
aqmcmName: McMurdo Station, Ross Island
Time zone:
Antarctica/McMurdo -
aqplmName: Palmer Station, Anvers Island
Time zone:
Antarctica/Palmer -
aqrotName: Rothera Station, Adelaide Island
Time zone:
Antarctica/Rothera -
aqsywName: Syowa Station, East Ongul Island
Time zone:
Antarctica/Syowa -
aqtrlName: Troll Station, Queen Maud Land
Time zone:
Antarctica/Troll -
aqvosName: Vostok Station, Lake Vostok
Time zone:
Antarctica/Vostok -
arbueName: Buenos Aires, Argentina
Time zone:
America/Buenos_Aires,America/Argentina/Buenos_Aires -
arcorName: Córdoba, Argentina
Time zone:
America/Cordoba,America/Argentina/Cordoba,America/Rosario -
arctcName: Catamarca, Argentina
Time zone:
America/Catamarca,America/Argentina/Catamarca,America/Argentina/ComodRivadavia -
arirjName: La Rioja, Argentina
Time zone:
America/Argentina/La_Rioja -
arjujName: Jujuy, Argentina
Time zone:
America/Jujuy,America/Argentina/Jujuy -
arluqName: San Luis, Argentina
Time zone:
America/Argentina/San_Luis -
armdzName: Mendoza, Argentina
Time zone:
America/Mendoza,America/Argentina/Mendoza -
arrglName: Río Gallegos, Argentina
Time zone:
America/Argentina/Rio_Gallegos -
arslaName: Salta, Argentina
Time zone:
America/Argentina/Salta -
artucName: Tucumán, Argentina
Time zone:
America/Argentina/Tucuman -
aruaqName: San Juan, Argentina
Time zone:
America/Argentina/San_Juan -
arushName: Ushuaia, Argentina
Time zone:
America/Argentina/Ushuaia -
asppgName: Pago Pago, American Samoa
Time zone:
Pacific/Pago_Pago,Pacific/Samoa,US/Samoa -
atvieName: Vienna, Austria
Time zone:
Europe/Vienna -
auadlName: Adelaide, Australia
Time zone:
Australia/Adelaide,Australia/South -
aubhqName: Broken Hill, Australia
Time zone:
Australia/Broken_Hill,Australia/Yancowinna -
aubneName: Brisbane, Australia
Time zone:
Australia/Brisbane,Australia/Queensland -
audrwName: Darwin, Australia
Time zone:
Australia/Darwin,Australia/North -
aueucName: Eucla, Australia
Time zone:
Australia/Eucla -
auhbaName: Hobart, Australia
Time zone:
Australia/Hobart,Australia/Tasmania,Australia/Currie -
auknsCurrie, Australia
Deprecated. See instead
auhba -
auldcName: Lindeman Island, Australia
Time zone:
Australia/Lindeman -
auldhName: Lord Howe Island, Australia
Time zone:
Australia/Lord_Howe,Australia/LHI -
aumelName: Melbourne, Australia
Time zone:
Australia/Melbourne,Australia/Victoria -
aumqiName: Macquarie Island Station, Macquarie Island
Time zone:
Antarctica/Macquarie -
auperName: Perth, Australia
Time zone:
Australia/Perth,Australia/West -
ausydName: Sydney, Australia
Time zone:
Australia/Sydney,Australia/ACT,Australia/Canberra,Australia/NSW -
awauaName: Aruba
Time zone:
America/Aruba -
azbakName: Baku, Azerbaijan
Time zone:
Asia/Baku -
basjjName: Sarajevo, Bosnia and Herzegovina
Time zone:
Europe/Sarajevo -
bbbgiName: Barbados
Time zone:
America/Barbados -
bddacName: Dhaka, Bangladesh
Time zone:
Asia/Dhaka,Asia/Dacca -
bebruName: Brussels, Belgium
Time zone:
Europe/Brussels -
bfouaName: Ouagadougou, Burkina Faso
Time zone:
Africa/Ouagadougou -
bgsofName: Sofia, Bulgaria
Time zone:
Europe/Sofia -
bhbahName: Bahrain
Time zone:
Asia/Bahrain -
bibjmName: Bujumbura, Burundi
Time zone:
Africa/Bujumbura -
bjptnName: Porto-Novo, Benin
Time zone:
Africa/Porto-Novo -
bmbdaName: Bermuda
Time zone:
Atlantic/Bermuda -
bnbwnName: Brunei
Time zone:
Asia/Brunei -
bolpbName: La Paz, Bolivia
Time zone:
America/La_Paz -
bqkraName: Bonaire, Sint Estatius and Saba
Time zone:
America/Kralendijk -
brauxName: Araguaína, Brazil
Time zone:
America/Araguaina -
brbelName: Belém, Brazil
Time zone:
America/Belem -
brbvbName: Boa Vista, Brazil
Time zone:
America/Boa_Vista -
brcgbName: Cuiabá, Brazil
Time zone:
America/Cuiaba -
brcgrName: Campo Grande, Brazil
Time zone:
America/Campo_Grande -
brernName: Eirunepé, Brazil
Time zone:
America/Eirunepe -
brfenName: Fernando de Noronha, Brazil
Time zone:
America/Noronha,Brazil/DeNoronha -
brforName: Fortaleza, Brazil
Time zone:
America/Fortaleza -
brmaoName: Manaus, Brazil
Time zone:
America/Manaus,Brazil/West -
brmczName: Maceió, Brazil
Time zone:
America/Maceio -
brpvhName: Porto Velho, Brazil
Time zone:
America/Porto_Velho -
brrbrName: Rio Branco, Brazil
Time zone:
America/Rio_Branco,America/Porto_Acre,Brazil/Acre -
brrecName: Recife, Brazil
Time zone:
America/Recife -
brsaoName: São Paulo, Brazil
Time zone:
America/Sao_Paulo,Brazil/East -
brssaName: Bahia, Brazil
Time zone:
America/Bahia -
brstmName: Santarém, Brazil
Time zone:
America/Santarem -
bsnasName: Nassau, Bahamas
Time zone:
America/Nassau -
btthiName: Thimphu, Bhutan
Time zone:
Asia/Thimphu,Asia/Thimbu -
bwgbeName: Gaborone, Botswana
Time zone:
Africa/Gaborone -
bymsqName: Minsk, Belarus
Time zone:
Europe/Minsk -
bzbzeName: Belize
Time zone:
America/Belize -
cacfqName: Creston, Canada
Time zone:
America/Creston -
caedmName: Edmonton, Canada
Time zone:
America/Edmonton,Canada/Mountain,America/Yellowknife -
caffsRainy River, Canada
Deprecated. See instead
cawnp -
cafneName: Fort Nelson, Canada
Time zone:
America/Fort_Nelson -
caglbName: Glace Bay, Canada
Time zone:
America/Glace_Bay -
cagooName: Goose Bay, Canada
Time zone:
America/Goose_Bay -
cahalName: Halifax, Canada
Time zone:
America/Halifax,Canada/Atlantic -
caiqlName: Iqaluit, Canada
Time zone:
America/Iqaluit,America/Pangnirtung -
camonName: Moncton, Canada
Time zone:
America/Moncton -
camtrMontreal, Canada
Deprecated. See instead
cator -
capntPangnirtung, Canada
Deprecated. See instead
caiql -
carebName: Resolute, Canada
Time zone:
America/Resolute -
caregName: Regina, Canada
Time zone:
America/Regina,Canada/East-Saskatchewan,Canada/Saskatchewan -
casjfName: St. John's, Canada
Time zone:
America/St_Johns,Canada/Newfoundland -
canpgNipigon, Canada
Deprecated. See instead
cator -
cathuThunder Bay, Canada
Deprecated. See instead
cator -
catorName: Toronto, Canada
Time zone:
America/Toronto,America/Montreal,Canada/Eastern,America/Nipigon,America/Thunder_Bay -
cavanName: Vancouver, Canada
Time zone:
America/Vancouver,Canada/Pacific -
cawnpName: Winnipeg, Canada
Time zone:
America/Winnipeg,Canada/Central,America/Rainy_River -
caybxName: Blanc-Sablon, Canada
Time zone:
America/Blanc-Sablon -
caycbName: Cambridge Bay, Canada
Time zone:
America/Cambridge_Bay -
caydaName: Dawson, Canada
Time zone:
America/Dawson -
caydqName: Dawson Creek, Canada
Time zone:
America/Dawson_Creek -
cayekName: Rankin Inlet, Canada
Time zone:
America/Rankin_Inlet -
cayevName: Inuvik, Canada
Time zone:
America/Inuvik -
cayxyName: Whitehorse, Canada
Time zone:
America/Whitehorse,Canada/Yukon -
cayynName: Swift Current, Canada
Time zone:
America/Swift_Current -
cayzfYellowknife, Canada
Deprecated. See instead
caedm -
cayzsName: Atikokan, Canada
Time zone:
America/Coral_Harbour,America/Atikokan -
cccckName: Cocos (Keeling) Islands
Time zone:
Indian/Cocos -
cdfbmName: Lubumbashi, Democratic Republic of the Congo
Time zone:
Africa/Lubumbashi -
cdfihName: Kinshasa, Democratic Republic of the Congo
Time zone:
Africa/Kinshasa -
cfbgfName: Bangui, Central African Republic
Time zone:
Africa/Bangui -
cgbzvName: Brazzaville, Republic of the Congo
Time zone:
Africa/Brazzaville -
chzrhName: Zurich, Switzerland
Time zone:
Europe/Zurich -
ciabjName: Abidjan, Côte d'Ivoire
Time zone:
Africa/Abidjan -
ckrarName: Rarotonga, Cook Islands
Time zone:
Pacific/Rarotonga -
clipcName: Easter Island, Chile
Time zone:
Pacific/Easter,Chile/EasterIsland -
clpuqName: Punta Arenas, Chile
Time zone:
America/Punta_Arenas -
clsclName: Santiago, Chile
Time zone:
America/Santiago,Chile/Continental -
cmdlaName: Douala, Cameroon
Time zone:
Africa/Douala -
cnckgChongqing, China
Deprecated. See instead
cnsha -
cnhrbHarbin, China
Deprecated. See instead
cnsha -
cnkhgKashgar, China
Deprecated. See instead
cnurc -
cnshaName: Shanghai, China
Time zone:
Asia/Shanghai,Asia/Chongqing,Asia/Chungking,Asia/Harbin,PRC -
cnurcName: Ürümqi, China
Time zone:
Asia/Urumqi,Asia/Kashgar -
cobogName: Bogotá, Colombia
Time zone:
America/Bogota -
crsjoName: Costa Rica
Time zone:
America/Costa_Rica -
cst6cdtName: POSIX style time zone for US Central Time
Time zone:
CST6CDT -
cuhavName: Havana, Cuba
Time zone:
America/Havana,Cuba -
cvraiName: Cape Verde
Time zone:
Atlantic/Cape_Verde -
cxxchName: Christmas Island
Time zone:
Indian/Christmas -
cyfmgName: Famagusta, Cyprus
Time zone:
Asia/Famagusta -
cynicName: Nicosia, Cyprus
Time zone:
Asia/Nicosia,Europe/Nicosia -
czprgName: Prague, Czech Republic
Time zone:
Europe/Prague -
deberName: Berlin, Germany
Time zone:
Europe/Berlin -
debsngnName: Busingen, Germany
Time zone:
Europe/Busingen -
djjibName: Djibouti
Time zone:
Africa/Djibouti -
dkcphName: Copenhagen, Denmark
Time zone:
Europe/Copenhagen -
dmdomName: Dominica
Time zone:
America/Dominica -
dosdqName: Santo Domingo, Dominican Republic
Time zone:
America/Santo_Domingo -
dzalgName: Algiers, Algeria
Time zone:
Africa/Algiers -
ecgpsName: Galápagos Islands, Ecuador
Time zone:
Pacific/Galapagos -
ecgyeName: Guayaquil, Ecuador
Time zone:
America/Guayaquil -
eetllName: Tallinn, Estonia
Time zone:
Europe/Tallinn -
egcaiName: Cairo, Egypt
Time zone:
Africa/Cairo,Egypt -
eheaiName: El Aaiún, Western Sahara
Time zone:
Africa/El_Aaiun -
erasmName: Asmara, Eritrea
Time zone:
Africa/Asmera,Africa/Asmara -
esceuName: Ceuta, Spain
Time zone:
Africa/Ceuta -
eslpaName: Canary Islands, Spain
Time zone:
Atlantic/Canary -
esmadName: Madrid, Spain
Time zone:
Europe/Madrid -
est5edtName: POSIX style time zone for US Eastern Time
Time zone:
EST5EDT -
etaddName: Addis Ababa, Ethiopia
Time zone:
Africa/Addis_Ababa -
fihelName: Helsinki, Finland
Time zone:
Europe/Helsinki -
fimhqName: Mariehamn, Åland, Finland
Time zone:
Europe/Mariehamn -
fjsuvName: Fiji
Time zone:
Pacific/Fiji -
fkpsyName: Stanley, Falkland Islands
Time zone:
Atlantic/Stanley -
fmksaName: Kosrae, Micronesia
Time zone:
Pacific/Kosrae -
fmpniName: Pohnpei, Micronesia
Time zone:
Pacific/Ponape,Pacific/Pohnpei -
fmtkkName: Chuuk, Micronesia
Time zone:
Pacific/Truk,Pacific/Chuuk,Pacific/Yap -
fothoName: Faroe Islands
Time zone:
Atlantic/Faeroe,Atlantic/Faroe -
frparName: Paris, France
Time zone:
Europe/Paris -
galbvName: Libreville, Gabon
Time zone:
Africa/Libreville -
gazaGaza Strip, Palestinian Territories
Deprecated. See instead
gazastrp -
gazastrpName: Gaza Strip, Palestinian Territories
Time zone:
Asia/Gaza -
gblonName: London, United Kingdom
Time zone:
Europe/London,Europe/Belfast,GB,GB-Eire -
gdgndName: Grenada
Time zone:
America/Grenada -
getbsName: Tbilisi, Georgia
Time zone:
Asia/Tbilisi -
gfcayName: Cayenne, French Guiana
Time zone:
America/Cayenne -
gggciName: Guernsey
Time zone:
Europe/Guernsey -
ghaccName: Accra, Ghana
Time zone:
Africa/Accra -
gigibName: Gibraltar
Time zone:
Europe/Gibraltar -
gldkshvnName: Danmarkshavn, Greenland
Time zone:
America/Danmarkshavn -
glgohName: Nuuk (Godthåb), Greenland
Time zone:
America/Godthab,America/Nuuk -
globyName: Ittoqqortoormiit (Scoresbysund), Greenland
Time zone:
America/Scoresbysund -
glthuName: Qaanaaq (Thule), Greenland
Time zone:
America/Thule -
gmbjlName: Banjul, Gambia
Time zone:
Africa/Banjul -
gmtName: Greenwich Mean Time
Time zone:
Etc/GMT,Etc/GMT+0,Etc/GMT-0,Etc/GMT0,Etc/Greenwich,GMT,GMT+0,GMT-0,GMT0,Greenwich -
gnckyName: Conakry, Guinea
Time zone:
Africa/Conakry -
gpbbrName: Guadeloupe
Time zone:
America/Guadeloupe -
gpmsbName: Marigot, Saint Martin
Time zone:
America/Marigot -
gpsbhName: Saint Barthélemy
Time zone:
America/St_Barthelemy -
gqssgName: Malabo, Equatorial Guinea
Time zone:
Africa/Malabo -
grathName: Athens, Greece
Time zone:
Europe/Athens -
gsgrvName: South Georgia and the South Sandwich Islands
Time zone:
Atlantic/South_Georgia -
gtguaName: Guatemala
Time zone:
America/Guatemala -
gugumName: Guam
Time zone:
Pacific/Guam -
gwoxbName: Bissau, Guinea-Bissau
Time zone:
Africa/Bissau -
gygeoName: Guyana
Time zone:
America/Guyana -
hebronName: West Bank, Palestinian Territories
Time zone:
Asia/Hebron -
hkhkgName: Hong Kong SAR China
Time zone:
Asia/Hong_Kong,Hongkong -
hntguName: Tegucigalpa, Honduras
Time zone:
America/Tegucigalpa -
hrzagName: Zagreb, Croatia
Time zone:
Europe/Zagreb -
htpapName: Port-au-Prince, Haiti
Time zone:
America/Port-au-Prince -
hubudName: Budapest, Hungary
Time zone:
Europe/Budapest -
iddjjName: Jayapura, Indonesia
Time zone:
Asia/Jayapura -
idjktName: Jakarta, Indonesia
Time zone:
Asia/Jakarta -
idmakName: Makassar, Indonesia
Time zone:
Asia/Makassar,Asia/Ujung_Pandang -
idpnkName: Pontianak, Indonesia
Time zone:
Asia/Pontianak -
iedubName: Dublin, Ireland
Time zone:
Europe/Dublin,Eire -
imdgsName: Isle of Man
Time zone:
Europe/Isle_of_Man -
inccuName: Kolkata, India
Time zone:
Asia/Calcutta,Asia/Kolkata -
iodgaName: Chagos Archipelago
Time zone:
Indian/Chagos -
iqbgwName: Baghdad, Iraq
Time zone:
Asia/Baghdad -
irthrName: Tehran, Iran
Time zone:
Asia/Tehran,Iran -
isreyName: Reykjavik, Iceland
Time zone:
Atlantic/Reykjavik,Iceland -
itromName: Rome, Italy
Time zone:
Europe/Rome -
jeruslmName: Jerusalem
Time zone:
Asia/Jerusalem,Asia/Tel_Aviv,Israel -
jesthName: Jersey
Time zone:
Europe/Jersey -
jmkinName: Jamaica
Time zone:
America/Jamaica,Jamaica -
joammName: Amman, Jordan
Time zone:
Asia/Amman -
jptyoName: Tokyo, Japan
Time zone:
Asia/Tokyo,Japan -
kenboName: Nairobi, Kenya
Time zone:
Africa/Nairobi -
kgfruName: Bishkek, Kyrgyzstan
Time zone:
Asia/Bishkek -
khpnhName: Phnom Penh, Cambodia
Time zone:
Asia/Phnom_Penh -
kicxiName: Kiritimati, Kiribati
Time zone:
Pacific/Kiritimati -
kiphoName: Enderbury Island, Kiribati
Time zone:
Pacific/Enderbury,Pacific/Kanton -
kitrwName: Tarawa, Kiribati
Time zone:
Pacific/Tarawa -
kmyvaName: Comoros
Time zone:
Indian/Comoro -
knbasName: Saint Kitts
Time zone:
America/St_Kitts -
kpfnjName: Pyongyang, North Korea
Time zone:
Asia/Pyongyang -
krselName: Seoul, South Korea
Time zone:
Asia/Seoul,ROK -
kwkwiName: Kuwait
Time zone:
Asia/Kuwait -
kygecName: Cayman Islands
Time zone:
America/Cayman -
kzaauName: Aqtau, Kazakhstan
Time zone:
Asia/Aqtau -
kzakxName: Aqtobe, Kazakhstan
Time zone:
Asia/Aqtobe -
kzalaName: Almaty, Kazakhstan
Time zone:
Asia/Almaty -
kzguwName: Atyrau (Guryev), Kazakhstan
Time zone:
Asia/Atyrau -
kzksnName: Qostanay (Kostanay), Kazakhstan
Time zone:
Asia/Qostanay -
kzkzoName: Kyzylorda, Kazakhstan
Time zone:
Asia/Qyzylorda -
kzuraName: Oral, Kazakhstan
Time zone:
Asia/Oral -
lavteName: Vientiane, Laos
Time zone:
Asia/Vientiane -
lbbeyName: Beirut, Lebanon
Time zone:
Asia/Beirut -
lccasName: Saint Lucia
Time zone:
America/St_Lucia -
livdzName: Vaduz, Liechtenstein
Time zone:
Europe/Vaduz -
lkcmbName: Colombo, Sri Lanka
Time zone:
Asia/Colombo -
lrmlwName: Monrovia, Liberia
Time zone:
Africa/Monrovia -
lsmsuName: Maseru, Lesotho
Time zone:
Africa/Maseru -
ltvnoName: Vilnius, Lithuania
Time zone:
Europe/Vilnius -
luluxName: Luxembourg
Time zone:
Europe/Luxembourg -
lvrixName: Riga, Latvia
Time zone:
Europe/Riga -
lytipName: Tripoli, Libya
Time zone:
Africa/Tripoli,Libya -
macasName: Casablanca, Morocco
Time zone:
Africa/Casablanca -
mcmonName: Monaco
Time zone:
Europe/Monaco -
mdkivName: Chişinău, Moldova
Time zone:
Europe/Chisinau,Europe/Tiraspol -
metgdName: Podgorica, Montenegro
Time zone:
Europe/Podgorica -
mgtnrName: Antananarivo, Madagascar
Time zone:
Indian/Antananarivo -
mhkwaName: Kwajalein, Marshall Islands
Time zone:
Pacific/Kwajalein,Kwajalein -
mhmajName: Majuro, Marshall Islands
Time zone:
Pacific/Majuro -
mkskpName: Skopje, Macedonia
Time zone:
Europe/Skopje -
mlbkoName: Bamako, Mali
Time zone:
Africa/Bamako,Africa/Timbuktu -
mmrgnName: Yangon (Rangoon), Burma
Time zone:
Asia/Rangoon,Asia/Yangon -
mncoqName: Choibalsan, Mongolia
Time zone:
Asia/Choibalsan -
mnhvdName: Khovd (Hovd), Mongolia
Time zone:
Asia/Hovd -
mnulnName: Ulaanbaatar (Ulan Bator), Mongolia
Time zone:
Asia/Ulaanbaatar,Asia/Ulan_Bator -
momfmName: Macau SAR China
Time zone:
Asia/Macau,Asia/Macao -
mpspnName: Saipan, Northern Mariana Islands
Time zone:
Pacific/Saipan -
mqfdfName: Martinique
Time zone:
America/Martinique -
mrnkcName: Nouakchott, Mauritania
Time zone:
Africa/Nouakchott -
msmniName: Montserrat
Time zone:
America/Montserrat -
mst7mdtName: POSIX style time zone for US Mountain Time
Time zone:
MST7MDT -
mtmlaName: Malta
Time zone:
Europe/Malta -
mupluName: Mauritius
Time zone:
Indian/Mauritius -
mvmleName: Maldives
Time zone:
Indian/Maldives -
mwblzName: Blantyre, Malawi
Time zone:
Africa/Blantyre -
mxchiName: Chihuahua, Mexico
Time zone:
America/Chihuahua -
mxcunName: Cancún, Mexico
Time zone:
America/Cancun -
mxcjsName: Ciudad Juárez, Mexico
Time zone:
America/Ciudad_Juarez -
mxhmoName: Hermosillo, Mexico
Time zone:
America/Hermosillo -
mxmamName: Matamoros, Mexico
Time zone:
America/Matamoros -
mxmexName: Mexico City, Mexico
Time zone:
America/Mexico_City,Mexico/General -
mxmidName: Mérida, Mexico
Time zone:
America/Merida -
mxmtyName: Monterrey, Mexico
Time zone:
America/Monterrey -
mxmztName: Mazatlán, Mexico
Time zone:
America/Mazatlan,Mexico/BajaSur -
mxojiName: Ojinaga, Mexico
Time zone:
America/Ojinaga -
mxpvrName: Bahía de Banderas, Mexico
Time zone:
America/Bahia_Banderas -
mxstisSanta Isabel (Baja California), Mexico
Deprecated. See instead
mxtij -
mxtijName: Tijuana, Mexico
Time zone:
America/Tijuana,America/Ensenada,Mexico/BajaNorte,America/Santa_Isabel -
mykchName: Kuching, Malaysia
Time zone:
Asia/Kuching -
mykulName: Kuala Lumpur, Malaysia
Time zone:
Asia/Kuala_Lumpur -
mzmpmName: Maputo, Mozambique
Time zone:
Africa/Maputo -
nawdhName: Windhoek, Namibia
Time zone:
Africa/Windhoek -
ncnouName: Noumea, New Caledonia
Time zone:
Pacific/Noumea -
nenimName: Niamey, Niger
Time zone:
Africa/Niamey -
nfnlkName: Norfolk Island
Time zone:
Pacific/Norfolk -
nglosName: Lagos, Nigeria
Time zone:
Africa/Lagos -
nimgaName: Managua, Nicaragua
Time zone:
America/Managua -
nlamsName: Amsterdam, Netherlands
Time zone:
Europe/Amsterdam -
nooslName: Oslo, Norway
Time zone:
Europe/Oslo -
npktmName: Kathmandu, Nepal
Time zone:
Asia/Katmandu,Asia/Kathmandu -
nrinuName: Nauru
Time zone:
Pacific/Nauru -
nuiueName: Niue
Time zone:
Pacific/Niue -
nzaklName: Auckland, New Zealand
Time zone:
Pacific/Auckland,Antarctica/South_Pole,NZ -
nzchtName: Chatham Islands, New Zealand
Time zone:
Pacific/Chatham,NZ-CHAT -
ommctName: Muscat, Oman
Time zone:
Asia/Muscat -
paptyName: Panama
Time zone:
America/Panama -
pelimName: Lima, Peru
Time zone:
America/Lima -
pfgmrName: Gambiera Islands, French Polynesia
Time zone:
Pacific/Gambier -
pfnhvName: Marquesas Islands, French Polynesia
Time zone:
Pacific/Marquesas -
pfpptName: Tahiti, French Polynesia
Time zone:
Pacific/Tahiti -
pgpomName: Port Moresby, Papua New Guinea
Time zone:
Pacific/Port_Moresby -
pgrawName: Bougainville, Papua New Guinea
Time zone:
Pacific/Bougainville -
phmnlName: Manila, Philippines
Time zone:
Asia/Manila -
pkkhiName: Karachi, Pakistan
Time zone:
Asia/Karachi -
plwawName: Warsaw, Poland
Time zone:
Europe/Warsaw,Poland -
pmmqcName: Saint Pierre and Miquelon
Time zone:
America/Miquelon -
pnpcnName: Pitcairn Islands
Time zone:
Pacific/Pitcairn -
prsjuName: Puerto Rico
Time zone:
America/Puerto_Rico -
pst8pdtName: POSIX style time zone for US Pacific Time
Time zone:
PST8PDT -
ptfncName: Madeira, Portugal
Time zone:
Atlantic/Madeira -
ptlisName: Lisbon, Portugal
Time zone:
Europe/Lisbon,Portugal -
ptpdlName: Azores, Portugal
Time zone:
Atlantic/Azores -
pwrorName: Palau
Time zone:
Pacific/Palau -
pyasuName: Asunción, Paraguay
Time zone:
America/Asuncion -
qadohName: Qatar
Time zone:
Asia/Qatar -
rereuName: Réunion
Time zone:
Indian/Reunion -
robuhName: Bucharest, Romania
Time zone:
Europe/Bucharest -
rsbegName: Belgrade, Serbia
Time zone:
Europe/Belgrade -
ruasfName: Astrakhan, Russia
Time zone:
Europe/Astrakhan -
rubaxName: Barnaul, Russia
Time zone:
Asia/Barnaul -
ruchitaName: Chita Zabaykalsky, Russia
Time zone:
Asia/Chita -
rudyrName: Anadyr, Russia
Time zone:
Asia/Anadyr -
rugdxName: Magadan, Russia
Time zone:
Asia/Magadan -
ruiktName: Irkutsk, Russia
Time zone:
Asia/Irkutsk -
rukgdName: Kaliningrad, Russia
Time zone:
Europe/Kaliningrad -
rukhndgName: Khandyga Tomponsky, Russia
Time zone:
Asia/Khandyga -
rukraName: Krasnoyarsk, Russia
Time zone:
Asia/Krasnoyarsk -
rukufName: Samara, Russia
Time zone:
Europe/Samara -
rukvxName: Kirov, Russia
Time zone:
Europe/Kirov -
rumowName: Moscow, Russia
Time zone:
Europe/Moscow,W-SU -
runozName: Novokuznetsk, Russia
Time zone:
Asia/Novokuznetsk -
ruomsName: Omsk, Russia
Time zone:
Asia/Omsk -
ruovbName: Novosibirsk, Russia
Time zone:
Asia/Novosibirsk -
rupkcName: Kamchatka Peninsula, Russia
Time zone:
Asia/Kamchatka -
rurtwName: Saratov, Russia
Time zone:
Europe/Saratov -
rusredName: Srednekolymsk, Russia
Time zone:
Asia/Srednekolymsk -
rutofName: Tomsk, Russia
Time zone:
Asia/Tomsk -
ruulyName: Ulyanovsk, Russia
Time zone:
Europe/Ulyanovsk -
ruuneraName: Ust-Nera Oymyakonsky, Russia
Time zone:
Asia/Ust-Nera -
ruuusName: Sakhalin, Russia
Time zone:
Asia/Sakhalin -
ruvogName: Volgograd, Russia
Time zone:
Europe/Volgograd -
ruvvoName: Vladivostok, Russia
Time zone:
Asia/Vladivostok -
ruyekName: Yekaterinburg, Russia
Time zone:
Asia/Yekaterinburg -
ruyksName: Yakutsk, Russia
Time zone:
Asia/Yakutsk -
rwkglName: Kigali, Rwanda
Time zone:
Africa/Kigali -
saruhName: Riyadh, Saudi Arabia
Time zone:
Asia/Riyadh -
sbhirName: Guadalcanal, Solomon Islands
Time zone:
Pacific/Guadalcanal -
scmawName: Mahé, Seychelles
Time zone:
Indian/Mahe -
sdkrtName: Khartoum, Sudan
Time zone:
Africa/Khartoum -
sestoName: Stockholm, Sweden
Time zone:
Europe/Stockholm -
sgsinName: Singapore
Time zone:
Asia/Singapore,Singapore -
shshnName: Saint Helena
Time zone:
Atlantic/St_Helena -
siljuName: Ljubljana, Slovenia
Time zone:
Europe/Ljubljana -
sjlyrName: Longyearbyen, Svalbard
Time zone:
Arctic/Longyearbyen,Atlantic/Jan_Mayen -
skbtsName: Bratislava, Slovakia
Time zone:
Europe/Bratislava -
slfnaName: Freetown, Sierra Leone
Time zone:
Africa/Freetown -
smsaiName: San Marino
Time zone:
Europe/San_Marino -
sndkrName: Dakar, Senegal
Time zone:
Africa/Dakar -
somgqName: Mogadishu, Somalia
Time zone:
Africa/Mogadishu -
srpbmName: Paramaribo, Suriname
Time zone:
America/Paramaribo -
ssjubName: Juba, South Sudan
Time zone:
Africa/Juba -
sttmsName: São Tomé, São Tomé and Príncipe
Time zone:
Africa/Sao_Tome -
svsalName: El Salvador
Time zone:
America/El_Salvador -
sxphiName: Sint Maarten
Time zone:
America/Lower_Princes -
sydamName: Damascus, Syria
Time zone:
Asia/Damascus -
szqmnName: Mbabane, Swaziland
Time zone:
Africa/Mbabane -
tcgdtName: Grand Turk, Turks and Caicos Islands
Time zone:
America/Grand_Turk -
tdndjName: N'Djamena, Chad
Time zone:
Africa/Ndjamena -
tfpfrName: Kerguelen Islands, French Southern Territories
Time zone:
Indian/Kerguelen -
tglfwName: Lomé, Togo
Time zone:
Africa/Lome -
thbkkName: Bangkok, Thailand
Time zone:
Asia/Bangkok -
tjdyuName: Dushanbe, Tajikistan
Time zone:
Asia/Dushanbe -
tkfkoName: Fakaofo, Tokelau
Time zone:
Pacific/Fakaofo -
tldilName: Dili, East Timor
Time zone:
Asia/Dili -
tmasbName: Ashgabat, Turkmenistan
Time zone:
Asia/Ashgabat,Asia/Ashkhabad -
tntunName: Tunis, Tunisia
Time zone:
Africa/Tunis -
totbuName: Tongatapu, Tonga
Time zone:
Pacific/Tongatapu -
tristName: Istanbul, Türkiye
Time zone:
Europe/Istanbul,Asia/Istanbul,Turkey -
ttposName: Port of Spain, Trinidad and Tobago
Time zone:
America/Port_of_Spain -
tvfunName: Funafuti, Tuvalu
Time zone:
Pacific/Funafuti -
twtpeName: Taipei, Taiwan
Time zone:
Asia/Taipei,ROC -
tzdarName: Dar es Salaam, Tanzania
Time zone:
Africa/Dar_es_Salaam -
uaievName: Kyiv, Ukraine
Time zone:
Europe/Kiev,Europe/Kyiv,Europe/Zaporozhye,Europe/Uzhgorod -
uaozhZaporizhia (Zaporozhye), Ukraine
Deprecated. See instead
uaiev -
uasipName: Simferopol, Ukraine
Time zone:
Europe/Simferopol -
uauzhUzhhorod (Uzhgorod), Ukraine
Deprecated. See instead
uaiev -
ugklaName: Kampala, Uganda
Time zone:
Africa/Kampala -
umawkName: Wake Island, U.S. Minor Outlying Islands
Time zone:
Pacific/Wake -
umjonJohnston Atoll, U.S. Minor Outlying Islands
Deprecated. See instead
ushnl -
ummdyName: Midway Islands, U.S. Minor Outlying Islands
Time zone:
Pacific/Midway -
unkName: Unknown time zone
Time zone:
Etc/Unknown -
usadkName: Adak (Alaska), United States
Time zone:
America/Adak,America/Atka,US/Aleutian -
usaegName: Marengo (Indiana), United States
Time zone:
America/Indiana/Marengo -
usancName: Anchorage, United States
Time zone:
America/Anchorage,US/Alaska -
usboiName: Boise (Idaho), United States
Time zone:
America/Boise -
uschiName: Chicago, United States
Time zone:
America/Chicago,US/Central -
usdenName: Denver, United States
Time zone:
America/Denver,America/Shiprock,Navajo,US/Mountain -
usdetName: Detroit, United States
Time zone:
America/Detroit,US/Michigan -
ushnlName: Honolulu, United States
Time zone:
Pacific/Honolulu,US/Hawaii,Pacific/Johnston -
usindName: Indianapolis, United States
Time zone:
America/Indianapolis,America/Fort_Wayne,America/Indiana/Indianapolis,US/East-Indiana -
usinvevName: Vevay (Indiana), United States
Time zone:
America/Indiana/Vevay -
usjnuName: Juneau (Alaska), United States
Time zone:
America/Juneau -
usknxName: Knox (Indiana), United States
Time zone:
America/Indiana/Knox,America/Knox_IN,US/Indiana-Starke -
uslaxName: Los Angeles, United States
Time zone:
America/Los_Angeles,US/Pacific,US/Pacific-New -
usluiName: Louisville (Kentucky), United States
Time zone:
America/Louisville,America/Kentucky/Louisville -
usmnmName: Menominee (Michigan), United States
Time zone:
America/Menominee -
usmtmName: Metlakatla (Alaska), United States
Time zone:
America/Metlakatla -
usmocName: Monticello (Kentucky), United States
Time zone:
America/Kentucky/Monticello -
usnavajoShiprock (Navajo), United States
Deprecated. See instead
usden -
usndcntName: Center (North Dakota), United States
Time zone:
America/North_Dakota/Center -
usndnslName: New Salem (North Dakota), United States
Time zone:
America/North_Dakota/New_Salem -
usnycName: New York, United States
Time zone:
America/New_York,US/Eastern -
usoeaName: Vincennes (Indiana), United States
Time zone:
America/Indiana/Vincennes -
usomeName: Nome (Alaska), United States
Time zone:
America/Nome -
usphxName: Phoenix, United States
Time zone:
America/Phoenix,US/Arizona -
ussitName: Sitka (Alaska), United States
Time zone:
America/Sitka -
ustelName: Tell City (Indiana), United States
Time zone:
America/Indiana/Tell_City -
uswlzName: Winamac (Indiana), United States
Time zone:
America/Indiana/Winamac -
uswsqName: Petersburg (Indiana), United States
Time zone:
America/Indiana/Petersburg -
usxulName: Beulah (North Dakota), United States
Time zone:
America/North_Dakota/Beulah -
usyakName: Yakutat (Alaska), United States
Time zone:
America/Yakutat -
utcName: UTC (Coordinated Universal Time)
Time zone:
Etc/UTC,Etc/UCT,Etc/Universal,Etc/Zulu,UCT,UTC,Universal,Zulu -
utce01Name: 1 hour ahead of UTC
Time zone:
Etc/GMT-1 -
utce02Name: 2 hours ahead of UTC
Time zone:
Etc/GMT-2 -
utce03Name: 3 hours ahead of UTC
Time zone:
Etc/GMT-3 -
utce04Name: 4 hours ahead of UTC
Time zone:
Etc/GMT-4 -
utce05Name: 5 hours ahead of UTC
Time zone:
Etc/GMT-5 -
utce06Name: 6 hours ahead of UTC
Time zone:
Etc/GMT-6 -
utce07Name: 7 hours ahead of UTC
Time zone:
Etc/GMT-7 -
utce08Name: 8 hours ahead of UTC
Time zone:
Etc/GMT-8 -
utce09Name: 9 hours ahead of UTC
Time zone:
Etc/GMT-9 -
utce10Name: 10 hours ahead of UTC
Time zone:
Etc/GMT-10 -
utce11Name: 11 hours ahead of UTC
Time zone:
Etc/GMT-11 -
utce12Name: 12 hours ahead of UTC
Time zone:
Etc/GMT-12 -
utce13Name: 13 hours ahead of UTC
Time zone:
Etc/GMT-13 -
utce14Name: 14 hours ahead of UTC
Time zone:
Etc/GMT-14 -
utcw01Name: 1 hour behind UTC
Time zone:
Etc/GMT+1 -
utcw02Name: 2 hours behind UTC
Time zone:
Etc/GMT+2 -
utcw03Name: 3 hours behind UTC
Time zone:
Etc/GMT+3 -
utcw04Name: 4 hours behind UTC
Time zone:
Etc/GMT+4 -
utcw05Name: 5 hours behind UTC
Time zone:
Etc/GMT+5,EST -
utcw06Name: 6 hours behind UTC
Time zone:
Etc/GMT+6 -
utcw07Name: 7 hours behind UTC
Time zone:
Etc/GMT+7,MST -
utcw08Name: 8 hours behind UTC
Time zone:
Etc/GMT+8 -
utcw09Name: 9 hours behind UTC
Time zone:
Etc/GMT+9 -
utcw10Name: 10 hours behind UTC
Time zone:
Etc/GMT+10,HST -
utcw11Name: 11 hours behind UTC
Time zone:
Etc/GMT+11 -
utcw12Name: 12 hours behind UTC
Time zone:
Etc/GMT+12 -
uymvdName: Montevideo, Uruguay
Time zone:
America/Montevideo -
uzskdName: Samarkand, Uzbekistan
Time zone:
Asia/Samarkand -
uztasName: Tashkent, Uzbekistan
Time zone:
Asia/Tashkent -
vavatName: Vatican City
Time zone:
Europe/Vatican -
vcsvdName: Saint Vincent, Saint Vincent and the Grenadines
Time zone:
America/St_Vincent -
veccsName: Caracas, Venezuela
Time zone:
America/Caracas -
vgtovName: Tortola, British Virgin Islands
Time zone:
America/Tortola -
visttName: Saint Thomas, U.S. Virgin Islands
Time zone:
America/St_Thomas,America/Virgin -
vnsgnName: Ho Chi Minh City, Vietnam
Time zone:
Asia/Saigon,Asia/Ho_Chi_Minh -
vuvliName: Efate, Vanuatu
Time zone:
Pacific/Efate -
wfmauName: Wallis Islands, Wallis and Futuna
Time zone:
Pacific/Wallis -
wsapwName: Apia, Samoa
Time zone:
Pacific/Apia -
yeadeName: Aden, Yemen
Time zone:
Asia/Aden -
ytmamName: Mayotte
Time zone:
Indian/Mayotte -
zajnbName: Johannesburg, South Africa
Time zone:
Africa/Johannesburg -
zmlunName: Lusaka, Zambia
Time zone:
Africa/Lusaka -
zwhreName: Harare, Zimbabwe
Time zone:
Africa/Harare
See the standard documentation for more information.
-
-
vaA Unicode Variant Identifier defines a special variant used for locales.
Transform extensions
This is used for transliterations, transcriptions, translations, etc, as per RFC6497>
For example:
-
ja-t-itThe content is Japanese, transformed from Italian.
-
ja-Kana-t-itThe content is Japanese Katakana, transformed from Italian.
-
und-Latn-t-und-cyrlThe content is in the Latin script, transformed from the Cyrillic script.
-
und-Cyrl-t-und-latn-m0-ungegn-2007The content is in Cyrillic, transformed from Latin, according to a UNGEGN specification dated 2007.
The date is of format
YYYYMMDDall without space, and the month and day information should be provided only when necessary for clarification, as per the RFC6497, section 2.5(c) -
und-Cyrl-t-und-latn-m0-ungegnSame, but without year.
The complete list of valid subtags is as follows. They are all two to eight alphanumeric characters.
-
d0Transform destination: for non-languages/scripts, such as fullwidth-halfwidth conversion
See also
s0Possible values are:
-
accentsMap base + punctuation, etc to accented characters
-
asciiMap as many characters to the closest ASCII character as possible
-
casefoldApply Unicode case folding
-
charnameMap each character to its Unicode name
-
digitConvert to digit form of accent
-
fccMap string to the FCC format; http://unicode.org/notes/tn5
-
fcdMap string to the FCD format; http://unicode.org/notes/tn5
-
fwidthMap characters to their fullwidth equivalents
-
hexMap characters to a hex equivalents, eg
ato\u0061; for hex variants see transform.xml -
hwidthMap characters to their halfwidth equivalents
-
lowerApply Unicode full lowercase mapping
-
morseMap Unicode to Morse Code encoding
-
nfcMap string to the Unicode NFC format
-
nfdMap string to the Unicode NFD format
-
nfkcMap string to the Unicode NFKC format
-
nfkdMap string to the Unicode NFKD format
-
npinyinMap pinyin written with tones to the numeric form
-
nullMake no change in the string
-
publishMap to preferred forms for publishing, such as
,,— -
removeRemove every character in the string
-
titleApply Unicode full titlecase mapping
-
upperApply Unicode full uppercase mapping
-
zawgyiMap Unicode to Zawgyi Myanmar encoding
-
-
h0Hybrid Locale Identifiers:
h0with the valuehybridindicates that the-t-value is a language that is mixed into the main language tag to form a hybrid.For example:
-
hi-t-en-h0-hybridHybrid Deva - Hinglish
Hindi-English hybrid where the script is Devanagari*
-
hi-Latn-t-en-h0-hybridHybrid Latin - Hinglish
Hindi-English hybrid where the script is Latin*
-
ru-t-en-h0-hybridHybrid Cyrillic - Runglish
Russian with an admixture of American English
-
ru-t-en-gb-h0-hybridHybrid Cyrillic - Runglish
Russian with an admixture of British English
-
en-t-zh-h0-hybridHybrid Latin - Chinglish
American English with an admixture of Chinese (Simplified Mandarin Chinese)
-
en-t-zh-hant-h0-hybridHybrid Latin - Chinglish
American English with an admixture of Chinese (Traditional Mandarin Chinese)
-
-
i0Input Method Engine transform: used to indicate an input method transformation, such as one used by a client-side input method. The first subfield in a sequence would typically be a
platformor vendor designation.For example:
zh-t-i0-pinyinPossible values are:
-
handwritHandwriting input: used when the only information known (or requested) is that the text was (or is to be) converted using an handwriting input.
-
pinyinPinyin input: for simplified Chinese characters. See also http://en.wikipedia.org/wiki/Pinyin_method.
-
undThe choice of input method is not specified. Used when the only information known (or requested) is that the text was (or is to be) converted using an input method engine
-
wubiWubi input: for simplified Chinese characters. For background information, see http://en.wikipedia.org/wiki/Wubi_method
-
-
k0Keyboard transform: used to indicate a keyboard transformation, such as one used by a client-side virtual keyboard. The first subfield in a sequence would typically be a
platformdesignation, representing the platform that the keyboard is intended for.For example:
en-t-k0-dvorakPossible values are:
-
101key101 key layout.
-
102key102 key layout.
-
600dpiKeyboard for a 600 dpi device.
-
768dpiKeyboard for a 768 dpi device.
-
androidAndroid keyboard.
-
azertyA AZERTY-based keyboard or one that approximates AZERTY in a different script.
-
chromeosChromeOS keyboard.
-
colemakColemak keyboard layout. The Colemak keyboard is an alternative to the QWERTY and dvorak keyboards. http://colemak.com/.
-
dvorakDvorak keyboard layout. See also http://en.wikipedia.org/wiki/Dvorak_Simplified_Keyboard.
-
dvoraklDvorak left-handed keyboard layout. See also http://en.wikipedia.org/wiki/File:KB_Dvorak_Left.svg.
-
dvorakrDvorak right-handed keyboard layout. See also http://en.wikipedia.org/wiki/File:KB_Dvorak_Right.svg.
-
el220Greek 220 keyboard. See also http://www.microsoft.com/resources/msdn/goglobal/keyboards/kbdhela2.html.
-
el319Greek 319 keyboard. See also ftp://ftp.software.ibm.com/software/globalization/keyboards/KBD319.pdf.
-
extendedA keyboard that has been enhanced with a large number of extra characters.
-
googlevkGoogle virtual keyboard.
-
isiriPersian ISIRI keyboard. Based on ISIRI 2901:1994 standard. See also http://behdad.org/download/Publications/persiancomputing/a007.pdf.
-
legacyA keyboard that has been replaced with a newer standard but is kept for legacy purposes.
-
lt1205Lithuanian standard keyboard, based on the LST 1205:1992 standard. See also http://www.kada.lt/litwin/.
-
lt1582Lithuanian standard keyboard, based on the LST 1582:2000 standard. See also http://www.kada.lt/litwin/.
-
nutaaqInuktitut Nutaaq keyboard. See also http://www.pirurvik.ca/en/webfm_send/15.
-
osxMac OSX keyboard.
-
pattaThai Pattachote keyboard. This is a less frequently used layout in Thai (Kedmanee layout is more popular). See also http://www.nectec.or.th/it-standards/keyboard_layout/thai-key.htm.
-
qwertyQWERTY-based keyboard or one that approximates QWERTY in a different script.
-
qwertzQWERTZ-based keyboard or one that approximates QWERTZ in a different script.
-
ta99Tamil 99 keyboard. See also http://www.tamilvu.org/Tamilnet99/annex1.htm.
-
undThe vender for the keyboard is not specified. Used when the only information known (or requested) is that the text was (or is to be) converted using an keyboard.
-
varA keyboard layout with small variations from the default.
-
viqrVietnamese VIQR layout, based on http://tools.ietf.org/html/rfc1456.
-
windowsWindows keyboard.
-
-
m0Transform extension mechanism: to reference an authority or rules for a type of transformation.
For example:
und-Latn-t-ru-m0-ungegn-2007Possible values are:
-
aethiopiEncylopedia Aethiopica Transliteration
-
alalocAmerican Library Association-Library of Congress
-
betametsBeta Maṣāḥǝft Transliteration
-
bgnUS Board on Geographic Names
-
buckwaltBuckwalter Arabic transliteration system
-
c11for hex transforms, using the C11 syntax: \u0061\U0001F4D6
-
cssfor hex transforms, using the CSS syntax: \61 \01F4D6, spacing where necessary
-
dinDeutsches Institut für Normung
-
es3842Ethiopian Standards Agency ES 3842:2014 Ethiopic-Latin Transliteration
-
ewtsExtended Wylie Transliteration Scheme
-
gostEuro-Asian Council for Standardization, Metrology and Certification
-
gurageGurage Legacy to Modern Transliteration
-
gutgartsYaros Gutgarts Ethiopic-Cyrillic Transliteration
-
iastInternational Alphabet of Sanskrit Transliteration
-
iesjesIES/JES Amharic Transliteration
-
isoInternational Organization for Standardization
-
javafor hex transforms, using the Java syntax: \u0061\uD83D\uDCD6
-
lambdinThomas Oden Lambdin Ethiopic-Latin Transliteration
-
mcstKorean Ministry of Culture, Sports and Tourism
-
mnsMongolian National Standard
-
percentfor hex transforms, using the percent syntax: %61%F0%9F%93%96
-
perlfor hex transforms, using the perl syntax: \x{61}\x{1F4D6}
-
plainfor hex transforms, with no surrounding syntax, spacing where necessary: 0061 1F4D6
-
prprnametransform variant for proper names
-
sattsStandard Arabic Technical Transliteration System (SATTS)
-
seraSystem for Ethiopic Representation in ASCII
-
tekiealiTekie Alibekit Blin-Latin Transliteration
-
ungegnUnited Nations Group of Experts on Geographical Names
-
unicodeto hex with the Unicode syntax: U+0061 U+1F4D6, spacing where necessary
-
xalegetEritrean Ministry of Education Blin-Latin Transliteration
-
xmlfor hex transforms, using the xml syntax: a📖
-
xml10for hex transforms, using the xml decimal syntax: a📖
-
-
s0Transform source: for non-languages/scripts, such as fullwidth-halfwidth conversion
See also
d0Possible values are:
-
accentsAccented characters to map base + punctuation, etc
-
asciiMap from ASCII to the target, perhaps using different conventions
-
hexMap characters from hex equivalents, trying all variants, eg
U+0061toa; for hex variants see transform.xml -
morseMap Morse Code to Unicode encoding
-
npinyinMap the numeric form of pinyin to the tone format
-
publishMap publishing characters, such as
,,—, to from vanilla characters -
zawgyiMap Zawgyi Myanmar encoding to Unicode
-
-
t0Machine Translation: used to indicate content that has been machine translated, or a request for a particular type of machine translation of content. The first subfield in a sequence would typically be a
platformor vendor designation.For example:
ja-t-de-t0-und -
x0Private Use.
For example:
ja-t-de-t0-und-x0-medical
Collation Options
Parametric settings can be specified in language tags or in rule syntax (in the form [keyword value] ). For example, -ks-level2 or [strength 2] will only compare strings based on their primary and secondary weights.
The options description below is taken from the LDML standard, and reflect how the algorithm works when implemented by web browser, or other runtime environment. This module does not do any of those algorithms. The documentation is only here for your benefit and convenience.
See the standard documentation and the DUCET (Default Unicode Collation Element Table) for more information.
-
kaorcolAlternateSets alternate handling for variable weights.
Possible values are optional and can be:
-
noignoreornon-ignorableDefault value.
-
shifted
-
-
kborcolBackwardsSets collation parameter key for backward collation weight.
Sets alternate handling for variable weights.
Possible values are optional and can be:
trueoryes,false(default) orno -
kcorcolCaseLevelSets collation parameter key for case level.
Specifies a boolean. If
on, a level consisting only of case characteristics will be inserted in front of tertiary level, as a "Level 2.5". To ignore accents but take case into account, set strength toprimaryand case level toon.Possible values are optional and can be:
trueoryes,false(default) orno -
kforcolCaseFirstSets collation parameter key for ordering by case.
If set to upper, causes upper case to sort before lower case. If set to lower, causes lower case to sort before upper case.
Possible values are:
upper,lower,false(default) orno -
khorcolHiraganaQuaternarySets collation parameter key for special Hiragana handling.
This is deprecated by the LDML standard.
Specifies a boolean. Controls special treatment of Hiragana code points on quaternary level. If turned on, Hiragana codepoints will get lower values than all the other non-variable code points in shifted.
Possible values are optional and can be:
true(default) oryes,falseorno -
kkorcolNormalizationSets collation parameter key for normalisation.
Specifies a boolean. If on, then the normal UCA algorithm is used.
Possible values are optional and can be:
true(default) oryes,falseorno -
knorcolNumericSets collation parameter key for numeric handling.
Specifies a boolean. If set to on, any sequence of Decimal Digits is sorted at a primary level with its numeric value.
Possible values are optional and can be:
trueoryes,false(default) orno -
krorcolReorderSets collation reorder codes.
Specifies a reordering of scripts or other significant blocks of characters such as symbols, punctuation, and digits.
Possible values are:
currency,digit,punct,space,symbol, or any BCP47 script ID.Also possible:
otherswhere all codes not explicitly mentioned should be ordered. The script code Zzzz (Unknown Script) is a synonym for others.For example:
-
en-u-kr-latn-digitReorder digits after Latin characters.
-
en-u-kr-arab-cyrl-others-symbolReorder Arabic characters first, then Cyrillic, and put symbols at the end—after all other characters.
-
en-u-kr-othersRemove any locale-specific reordering, and use DUCET order for reordering blocks.
-
-
ksorcolStrengthSets the collation parameter key for collation strength used for comparison.
Possible values are:
level1orprimarylevel2orsecondarylevel3(default) ortertiarylevel4orquaternaryorquarternaryidenticoridentical
-
kvSets the collation parameter key for
maxVariable, the last reordering group to be affected byka-shifted.Possible values are:
-
currencySpaces, punctuation and all symbols are affected by ka-shifted.
-
punctSpaces and punctuation are affected by ka-shifted (CLDR default).
-
spaceOnly spaces are affected by ka-shifted.
-
symbolSpaces, punctuation and symbols except for currency symbols are affected by ka-shifted (UCA default).
-
-
vtSets the parameter key for the variable top.
This is deprecated by the LDML standard.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
https://github.com/unicode-org/cldr/tree/main/common/bcp47, https://en.wikipedia.org/wiki/IETF_language_tag
https://www.rfc-editor.org/info/bcp47
Unicode Locale Data Markup Language
RFC6067 on the Unicode extensions
RFC6497 on the transformation extension
COPYRIGHT & LICENSE
Copyright(c) 2024 DEGUEST Pte. Ltd.
All rights reserved
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.