NAME

Encode::Detective - detect a data encoding

SYNOPSIS

use Encode;
require Encode::Detective 'detect';
my $encoding = detect ($data);
# Now $encoding contains a guess of the encoding of $data.

DESCRIPTION

This module guesses the character set of input data. It is similar to Encode::Guess, but does not require a list of expected encodings.

FUNCTIONS

detect

my $encoding = detect ($data);

Given a set of bytes, $data, this looks at the bytes, and guesses what encoding they are encoded in using probabilities.

DETECTED ENCODINGS

The following encodings are detected:

UTF-8
EUC-JP
Big5
Shift_JIS
EUC-KR
EUC-TW

Taiwanese encoding.

windows-1251

Cyrillic encoding.

windows-1255

Hebrew encoding.

Character sets not detected

mac roman
CP932

An extension of Shift-JIS, more common in practice than actual Shift-JIS.

TODO

The module needs more tests. Please send example files

BUGS

TIS-620

TIS-620 does not seem to be detected.

Documentation of detection

The documentation of detected encodings above is not complete.

HISTORY

This module is based on code of Firefox. When this module was created, the C++ code for character set detection was available as a standalone library. Now the code cannot be used as a standalone library, so this has become a fork of the original Mozilla code.

Encode::Detective is a fork of Encode::Detect. It removes almost all of the interface of Encode::Detect except the single function "detect". This fork was released to CPAN to improve the compilation of the module on various systems.

SEE ALSO

edetect

The edetect standalone script can guess the encodings of files.

Encode::Guess

Encode::Guess is a Perl module which does something similar to Encode::Detective.

Encode::Detect

The original version of this module.

AUTHORS

Encode::Detective is based on Encode::Detect by John Gardiner Myers <jgmyers@proofpoint.com>. It was forked by Ben Bullock <bkb@cpan.org>.

LICENCE

This Perl module may be used, copied, modified and redistributed under the terms of the Mozilla Public License version 1.1, the GNU General Public License, or the LGPL.