NAME
Encode::Detective - detect a data encoding
SYNOPSIS
use Encode;
require Encode::Detective 'detect';
my $encoding = detect ($data);
# Now $encoding contains a guess of the encoding of $data.
DESCRIPTION
This module guesses the character set of input data. It is similar to Encode::Guess, but does not require a list of expected encodings.
FUNCTIONS
detect
my $encoding = detect ($data);
Given a set of bytes, $data
, this looks at the bytes, and guesses what encoding they are encoded in using probabilities.
DETECTED ENCODINGS
The following encodings are detected:
- UTF-8
- EUC-JP
- Big5
- Shift_JIS
- EUC-KR
- EUC-TW
-
Taiwanese encoding.
- windows-1251
-
Cyrillic encoding.
- windows-1255
-
Hebrew encoding.
Character sets not detected
TODO
The module needs more tests. Please send example files
BUGS
TIS-620
TIS-620 does not seem to be detected.
Documentation of detection
The documentation of detected encodings above is not complete.
HISTORY
This module is based on code of Firefox. When this module was created, the C++ code for character set detection was available as a standalone library. Now the code cannot be used as a standalone library, so this has become a fork of the original Mozilla code.
Encode::Detective is a fork of Encode::Detect. It removes almost all of the interface of Encode::Detect except the single function "detect". This fork was released to CPAN to improve the compilation of the module on various systems.
SEE ALSO
edetect
The edetect standalone script can guess the encodings of files.
Encode::Guess
Encode::Guess is a Perl module which does something similar to Encode::Detective.
Encode::Detect
The original version of this module.
AUTHORS
Encode::Detective is based on Encode::Detect by John Gardiner Myers <jgmyers@proofpoint.com>. It was forked by Ben Bullock <bkb@cpan.org>.
LICENCE
This Perl module may be used, copied, modified and redistributed under the terms of the Mozilla Public License version 1.1, the GNU General Public License, or the LGPL.