NAME
Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the correct one
SYNOPSIS
use Encode;
use Encode::DoubleEncodedUTF8;
my $dodgy_utf8 = "Some byte strings from the web/DB with double-encoded UTF-8 bytes";
my $fixed = decode("utf-8-de", $dodgy_utf8); # Fix it
DESCRIPTION
Encode::DoubleEncodedUTF8 adds a new encoding utf-8-de
and fixes double encoded utf-8 bytes found in the original bytes to the correct Unicode entity.
The double encoded utf-8 frequently happens when strings with UTF-8 flag and without are concatenated, for instance:
my $string = "L\x{e9}on"; # latin-1
utf8::upgrade($string);
my $bytes = "L\xc3\xa9on"; # utf-8
my $dodgy_utf8 = encode_utf8($string . " " . $bytes); # $bytes is now double encoded
my $fixed = decode("utf-8-de", $dodgy_utf8); # "L\x{e9}on L\x{e9}on";
See encoding::warnings for more details.
AUTHOR
Tatsuhiko Miyagawa <miyagawa@bulknews.net>
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.