NAME

Search::Tools::Transliterate - transliterations of UTF-8 chars

SYNOPSIS

my $tr = Search::Tools::Transliterate->new();
print $tr->convert( 'some string of utf8 chars' );

DESCRIPTION

Search::Tools::Transliterate transliterates UTF-8 characters to single-byte equivalents. It is based on the transmap project by Markus Kuhn http://www.cl.cam.ac.uk/~mgk25/.

METHODS

new

Create new instance. Takes the following optional parameters:

map: Customize the character mapping. Should be a hashref. See map() method.
ebit: Allow full native 8bit characters, rather than only 7bit ASCII. The default is true (1). Set to 0 to disable.

map

Access the transliteration character map. Example:

use Search::Tools::Transliterate;
my $tr = Search::Tools::Transliterate->new;
$tr->map->{mychar} = 'my transliteration';
print $tr->convert('mychar');  # prints 'my transliteration'

NOTE: The map() method is an accessor only. You can not pass in a new map.

is_valid_utf8( text )

Returns true if text is a valid sequence of UTF-8 bytes, regardless of how Perl has it flagged (is_utf8 or not).

is_ascii( text )

If text contains no bytes above 127, then returns true (1). Otherwise, returns false (0). Used by convert() internally to check text prior to transliterating.

is_latin1( text )

Returns true if text lies within the Latin1 charset.

is_flagged_utf8( text )

Returns true if Perl thinks text is UTF-8. Same as Encode::is_utf8().

is_sane_utf8( text )

Will test for double-y encoded text. Returns true if text looks ok. See Text::utf8 docs for explanation.

convert( text )

Returns UTF-8 text converted with all single bytes, transliterated according to %Map. Will croak if text is not valid UTF-8, so if in doubt, check first with is_valid_utf8().

to_utf8( text, charset )

Shorthand for running text through appropriate is_*() checks and then converting to UTF-8 if necessary. Returns text encoded and flagged as UTF-8.

Returns undef if for some reason the encoding failed or the result did not pass is_sane_utf8().

BUGS

You might consider the whole attempt as a bug. It's really an attempt to accomodate applications that don't support Unicode. Perhaps we shouldn't even try. But for things like curly quotes and other 'smart' punctuation, it's often helpful to render the UTF-8 character as something rather than just letting a character without a direct translation slip into the ether.

That said, if a character has no mapping (and there are plenty that do not) a single space will be used.

AUTHOR

Peter Karman perl@peknet.com

Thanks to Atomic Learning www.atomiclearning.com for sponsoring the development of this module.

Many of the UTF-8 tests come directly from Test::utf8.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

METHODS

new

map

is_valid_utf8( text )

is_ascii( text )

is_latin1( text )

is_flagged_utf8( text )

is_sane_utf8( text )

convert( text )

to_utf8( text, charset )

BUGS

AUTHOR

COPYRIGHT

SEE ALSO

NAME

SYNOPSIS

DESCRIPTION

METHODS

new

map

is_valid_utf8( text )

is_ascii( text )

is_latin1( text )

is_flagged_utf8( text )

is_sane_utf8( text )

convert( text )

to_utf8( text, charset )

BUGS

AUTHOR

COPYRIGHT

SEE ALSO

Module Install Instructions