NAME

Test::Locale::Utils - utilities for writing tests involving international characters

SYNOPSIS

 use Test::More;
 use Test::Locale::Utils qw( is_locale_international );

 my $i18n_test_cases = {
   'über maus' =>
   'Über Maus',

   'l\'oeuvre imposante d\'Honoré de Balzac' =>
   'L\'Oeuvre Imposante d\'Honoré de Balzac',
 }
 my $i18n_test_count = scalar( keys( %{ $i18n_test_cases } ) );

 my $i18n_system = is_locale_international();
 SKIP: {
    skip "Can't test strings with international chars", $i18n_count, unless $i18n_system;
    foreach my $case (sort keys %{ $i18n_test_cases }) {
      my $expected = $i18n_test_cases->{ $case };
      my $result   = capitalize_title( $case );
      is ($result, $expected, "Testing: $case");
    }
 }


# Older style (deprecated):

 use Test::More;
 use Test::Locale::Utils qw(:all);
 my @exchars = extract_extended_chars(\@strings);
 my $internat = internationalized_locale(@exchars);  # Deprecated
 my $exchars_str = join '', @exchars;
 my $exchars_rule = qr{[$exchars_str]};

 foreach my $string (@strings) {
    SKIP: {
       skip "This locale can't deal with i18n chars in string: $string", 1,
           unless ($internat && ($string =~ /$exchars_rule/) );

       is( $expected{$string}, string_transformation($string), "Testing $string" );
    }
 }

DESCRIPTION

A small collection of utility functions to make it easier to write tests that work with strings that may contain characters beyond the 7bit ASCII range (e.g. the "extended characters" or "international characters" of iso9959-1 and friends).

EXPORTED

Nothing by default. All of the following are exportable on request (and all may be requested with the ":all" tag).

extract_extended_chars

Given a reference to an array of strings, returns a list of all extended characters (i.e. characters with the eigth-bit set) that have appeared at least once in the strings.

is_locale_international

Does some crude checks of uc, lc, and ucfirst to see if they handle some international characters (latin-1) correctly, or at least well enough that we can expect the international character test cases of Text::Capitalize to have meaningful results.

is_uc_and_lc_internationalized

Looks at the behavior of uc and lc for a small sample of "international characters": this simply checks if the extended characters of latin-1 and friends have an upper and lower form defined as expected.

define_sample_i18n_chars

Returns a short list of pairs of extended characters, pairing a lowercase form with an uppercase one (an aref of arefs).

These were selected because they're the only extended characters in use in the test cases for Text::Capitalize.

is_ucfirst_internationalized

A very specific test to to see if ucfirst can upcase German's "over". If it can, we assume ucfirst is working on the kind of international characters used in the Text::Capitalized tests.

Motivation:

Solaris boxes apparently have a knack for getting uc and lc to work on international characters, but still leaving ucfirst broken -- it upcases the character *after* a leading international character (such as a latin-1 u-umlaut):

Text-Capitalize-0.8:
- i86pc-solaris-thread-multi / 5.8.8:
  - FAIL http://nntp.x.perl.org/group/perl.cpan.testers/5882611

- sun4-solaris-64int / 5.8.4:
  - FAIL http://nntp.x.perl.org/group/perl.cpan.testers/5846995
all_true

Example usage:

my $okay = all_true( \@checks );

This is an alternative to List::MoreUtils "all", written to avoid a non-core dependency for the Text::Capitalize tests.

Note: If you'd rather use that more common module, do this:

use List::MoreUtils qw( all );
my $okay = all { ($_) } @checks;
internationalized_locale

DEPRECATED. use is_locale_international instead.

Given an array of extended characters that you care about, this code will check to make sure that the current locale seems to comprehend what to do with them. Specifically, it checks to see if they have a defined upper and lower case.

This is an excessively simple version that just looks at the extended characters to see if they change case when run through either uc or lc.

This apparently fails for some locales, e.g. Russian, where the extended chars are in the same locations as in iso8859, but the upper and lower have reversed positions.

DISCUSSION

The "use locale" story seems to have some notable gaps. A brief summary, off the top of my head:

There's no definitive way to get a listing of all available locales on a system. The right way to do it varies from platform to platform. There's no definitive way of finding out what platform you're on: You can check ^O, but you need to parse it yourself (and that's not as easy as you might think: matching for /win/ to see if you're on a windows platform will get confused in cases like "cygwin"). There's no definitive list of all possible values of ^O. There are some useful tricks in the POSIX module that can help with these issues, but you can't count on every system that perl runs on being POSIX compliant, (and like I just said, checking what kind of platform you're on is a little trickier than you'd think).

And a recent discovery of mine: when the locale is utf-8, doing a "use locale" does not give you "unicode semantics", you actually have to do "utf8::upgrade" on anything you want "uc" and friends to work on. Heigh-ho.

This little module is an attempt at cutting the Gordian Knot represented by this cluster of problems, at least as far as the automated tests for Text::Capitalize are concerned.

Since it's difficult to determine the Right Way to do cross-platform checks of string handling including international characters, instead I use some simple operational tests to see if the system does what's expected with the international characters, and if not, the tests using those characters will be skipped.

SEE ALSO

perlocale

AUTHOR

Joseph Brenner, <doom@kzsu.stanford.edu>

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Joseph Brenner

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.

BUGS

None reported... yet.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 15:

Non-ASCII character seen before =encoding in ''über'. Assuming UTF-8