NAME

Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the correct one

SYNOPSIS

use Encode;
use Encode::DoubleEncodedUTF8;

my $dodgy_utf8 = "Some byte strings from the web/DB with double-encoded UTF-8 bytes";
my $fixed = decode("utf-8-de", $dodgy_utf8); # Fix it

DESCRIPTION

Encode::DoubleEncodedUTF8 adds a new encoding utf-8-de and fixes double encoded utf-8 bytes found in the original bytes to the correct Unicode entity.

The double encoded utf-8 frequently happens when strings with UTF-8 flag and without are concatenated, for instance:

my $string = "L\x{e9}on";   # latin-1
utf8::upgrade($string);
my $bytes  = "L\xc3\xa9on"; # utf-8

my $dodgy_utf8 = encode_utf8($string . " " . $bytes); # $bytes is now double encoded

my $fixed = decode("utf-8-de", $dodgy_utf8); # "L\x{e9}on L\x{e9}on";

See encoding::warnings for more details.

AUTHOR

Tatsuhiko Miyagawa <miyagawa@bulknews.net>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

encoding::warnings, Test::utf8