NAME
Locale::Country::Multilingual::Unicode - Recommended Usage with Unicode
VERSION
version 0.25
SYNOPSIS
use utf8;
use Encode::StdIO;
use Locale::Country::Multilingual {use_io_layer => 1};
my $lcm = Locale::Country::Multilingual->new;
$lcm->set_lang('de');
print $lcm->code2country('gb'), "\n";
DESCRIPTION
You are on a modern computer system, that uses utf-8 encoding by default. Locale::Country::Multilingual uses language data, that is in utf-8 too. Everything is fine.... Really?
Try this in your favorite terminal:
> perl -le 'print "bäh!"'
bäh!
Uppercase it:
> LANG=en_US perl -Mlocale -le 'print uc "bäh!"'
BäH!
Wrong! It should have been BÄH!. Though on latin1 systems it works. Same for Österreich - the German and native name for Austria. If you run lc() on it, it won't change.
What happened is, that you write files (and code) in utf-8, a multi-byte encoding, but Perl expects latin1 (iso-8859-1) by default, a single-byte encoding. Provided you use locale; together with an appropriate locale (here en_US) in your Perl program, a lowercase latin1 ä (0xe4) is turned into an uppercase Ä (0xc4) - but only if your input comes as latin1.
A utf-8 ä is encoded as 0xc3, 0xa4. Therefore uc() does not detect the two-byte ä as a letter that could be uppercased.
Language files in Locale::Country::Multilingual are in utf-8.
To make everything work the correct workflow is:
- use utf8;
-
This pragma tells Perl, that all text in your code is actually in
utf-8, so the Perl interpreter converts it into its internal string format correctly. Actually this is only necessary, when you have literals that contain non-ASCII characters, e.g. when you code:print "Dürüm Döner Kebap\n";Even if your system does not use
utf-8by default, your Perl programs should be encoded inutf-8. Use an editor where you can set the encoding. - Set encoding for input and output
-
By default Perl converts the internal string representation into
latin1for input and output. So the aboveprintoutput would be broken on a non-latin1system. For switchingSTDIN,STDOUTandSTDERRtoutf-8, you can write:binmode STDIN, ':utf8'; binmode STDOUT, ':utf8'; binmode STDERR, ':utf8';If your system uses another encoding, e.g.
"euc-jp", you can switch a filehandle to that encoding with:binmode FH, ':encoding(euc-jp)';In a web application don't forget to set the output MIME type as well!
If output goes to a terminal:
use Encode::StdIO;This module determines your terminal's encoding - even if it is something other than
utf-8- and sets the appropriate IO layers for the three standard IO handles. - Set
use_io_layer => 1 -
There are two places where this option can be specified: Either in
useor in new:use Locale::Country::Multilingual {use_io_layer => 1}; my $lcm = Locale::Country::Multilingual->new( lang => 'de', use_io_layer => 1, ); print uc $lcm->code2country('gb'), "\n";That should print
VEREINIGTES KÖNIGREICH GROSSBRITANNIEN UND NORDIRLANDWow! Even the
"ß"has been converted correctly into"SS".
NAME
Locale::Country::Multilingual::Unicode - Recommended Usage with Unicode
SEE ALSO
AUTHOR
Bernhard Graf graf(a)cpan,org
COPYRIGHT & LICENSE
This text is in the public domain.
AUTHORS
Bernhard Graf <graf@cpan.org>
Fayland Lam <fayland@gmail.com>
Greg Oschwald <oschwald@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Fayland Lam.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.