NAME

Lingua::Ident -- Statistical language identification

SYNOPSIS

use Lingua::Ident;
$i    = new Lingua::Ident("filename 1" ... "filename n");
$lang = $i->identify("text to classify"), "\n";

DESCRIPTION

This module implements a statistical language identifier.

The filename attributes to the constructor must refer to files containing tables of n-gram probabilites for languages. These tables can be generated using the trainlid(1) utility program.

RETURN VALUE

The identify() method returns the value specified in the _LANG field of the probabilities table of the language to which the text most likely belongs (see "WARNINGS").

It is recommended to be a POSIX locale name constructed from an ISO 639 2-letter language code, possibly extended by an ISO 3166 2-letter country code and a character set identifier. Example: de_DE.iso88591.

WARNINGS

Since Lingua::Ident is based on statistics it cannot be 100 % accurate. More precisely, Dunning (see below) reports his implementation to achieve 92 % accuracy with 50K of training text for 20 character strings discriminating bewteen English and Spanish. This implementation should be as accurate as Dunning's. However, not only the size but also the quality of the training text play a role.

The current implementation doesn't use a threshold to determine if the most probable language has a high enough probability; if you're trying to classify a text in a language for which there is no probability table, this results in getting an incorrect language.

AUTHOR

Lingua::Ident was developed by Michael Piotrowski <mxp@dynalabs.de>.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

RETURN VALUE

WARNINGS

AUTHOR

SEE ALSO

Module Install Instructions

Keyboard Shortcuts