NAME
I18N::LangTags - functions for dealing with RFC1766-style language tags
SYNOPSIS
use I18N::LangTags qw(is_language_tag same_language_tag
extract_language_tags super_languages
similarity_language_tag is_dialect_of);
...or whatever of those functions you want to import. Those are all the exportable functions -- you're free to import only some, or none at all. By default, none are imported.
If you don't import any of these functions, assume a &I18N::LangTags::
in front of all the function names in the following examples.
DESCRIPTION
Language tags are a formalism, described in RFC 1766, for declaring what language form (language and possibly dialect) a given chunk of information is in.
This library provides functions for common tasks involving language tags as they are needed in a variety of protocols and applications.
Please see the "See Also" references for a thorough explanation of how to correctly use language tags.
the function is_language_tag($lang1)
Returns true iff $lang1 is a formally valid language tag.
is_language_tag("fr") is TRUE is_language_tag("x-jicarilla") is FALSE (Subtags can be 8 chars long at most -- 'jicarilla' is 9) is_language_tag("i-Klikitat") is TRUE (True without regard to the fact noone has actually registered Klikitat -- it's a formally valid tag) is_language_tag("fr-patois") is TRUE (Formally valid -- altho descriptively weak!) is_language_tag("Spanish") is FALSE is_language_tag("french-patois") is FALSE (No good -- first subtag has to match /^([xXiI]|[a-zA-Z]{2})$/ -- see RFC1766)
the function extract_language_tags($whatever)
Returns a list of whatever looks like formally valid language tags in $whatever. Not very smart, so don't get too creative with what you want to feed it.
extract_language_tags("fr, fr-ca, i-mingo") returns: ('fr', 'fr-ca', 'i-mingo') extract_language_tags("It's like this: I'm in fr -- French!") returns: ('It', 'in', 'fr') (So don't just feed it any old thing.)
the function same_language_tag($lang1, $lang2)
Returns true iff $lang1 and $lang2 are acceptable variant tags representing the same language-form.
same_language_tag('x-kadara', 'i-kadara') is TRUE (The x/i- alternation doesn't matter) same_language_tag('X-KADARA', 'i-kadara') is TRUE (...and neither does case) same_language_tag('en', 'en-US') is FALSE (all-English is not the SAME as US English) same_language_tag('x-kadara', 'x-kadar') is FALSE (these are totally unrelated tags)
the function similarity_language_tag($lang1, $lang2)
Returns an integer representing the degree of similarity between tags $lang1 and $lang2 (the order of which does not matter), where similarity is the number of common elements on the left, without regard to case and to x/i- alternation.
similarity_language_tag('fr', 'fr-ca') is 1 (one element in common) similarity_language_tag('fr-ca', 'fr-FR') is 1 (one element in common) similarity_language_tag('fr-CA-joual', 'fr-CA-PEI') is 2 similarity_language_tag('fr-CA-joual', 'fr-CA') is 2 (two elements in common) similarity_language_tag('x-kadara', 'i-kadara') is 1 (x/i- doesn't matter) similarity_language_tag('en', 'x-kadar') is 0 similarity_language_tag('x-kadara', 'x-kadar') is 0 (unrelated tags -- no similarity) similarity_language_tag('i-cree-syllabic', 'i-cherokee-syllabic') is 0 (no B<leftmost> elements in common!)
the function is_dialect_of($lang1, $lang2)
Returns true iff language tag $lang1 represents a subdialect of language tag $lang2.
Get the order right! It doesn't work the other way around!
is_dialect_of('en-US', 'en') is TRUE (American English IS a dialect of all-English) is_dialect_of('en-US', 'en') is TRUE (American English IS a dialect of all-English) is_dialect_of('fr-CA-joual', 'fr-CA') is TRUE is_dialect_of('fr-CA-joual', 'fr') is TRUE (Joual is a dialect of (a dialect of) French) is_dialect_of('en', 'en-US') is FALSE (all-English is a NOT dialect of American English) is_dialect_of('fr', 'en-CA') is FALSE is_dialect_of('en', 'en' ) is TRUE (B<Note:> a degenerate case) is_dialect_of('i-mingo-tom', 'x-Mingo') is TRUE (the x/i thing doesn't matter, nor does case)
the function super_languages($lang1)
Returns a list of language tags that are superordinate tags to $lang1 -- it gets this by removing subtags from the end of $lang1 until nothing (or just "i" or "x") is left.
super_languages("fr-CA-joual") is ("fr-CA", "fr") super_languages("en-AU") is ("en") super_languages("en") is empty-list, () super_languages("i-cherokee") is empty-list, () ...not ("i"), which would be illegal as well as pointless.
Returns empty-list if $lang1 is not a valid language tag.
A notable and rather unavoidable problem with this method: "x-mingo-tom" has an "x" because the whole tag isn't an IANA-registered tag -- but super_languages('x-mingo-tom') is ('x-mmingo') -- which isn't really right, since 'i-mingo' is registered. But this module has no way of knowing that. (But note that same_language_tag('x-mingo', 'i-mingo') is TRUE.)
More importantly, you assume at your peril that superordinates of $lang1 are mutually intelligible with $lang1. Think REAL hard about how you use this. YOU HAVE BEEN WARNED.
NOTE
This library may (probably will) need ammending if/when RFC1766 is superceded.
SEE ALSO
* RFC 1766, ftp://ftp.isi.edu/in-notes/rfc1766.txt
, "Tags for the Identification of Languages".
* RFC 2277, ftp://ftp.isi.edu/in-notes/rfc2277.txt
, "IETF Policy on Character Sets and Languages".
* RFC 2231, ftp://ftp.isi.edu/in-notes/rfc2231.txt
, "MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations".
* Locale::Codes, in http://www.perl.com/CPAN/modules/by-module/Locale/
* ISO 639, "Code for the representation of names of languages", http://www.indigo.ie/egt/standards/iso639/iso639-1-en.html
* The IANA list of registered languages (hopefully up-to-date), ftp://ftp.isi.edu/in-notes/iana/assignments/languages/
COPYRIGHT
Copyright (c) 1998 Sean M. Burke. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
AUTHOR
Sean M. Burke <sburke@netadventure.net>