NAME
DTA::CAB::Analyzer::LangId::Simple - simple language guesser using stopword lists
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DTA::CAB::Analyzer::LangId::Simple;
##========================================================================
## Methods: Prepare
$bool = $lid->ensureLoaded();
##========================================================================
## Methods: Analysis: v1.x: API
$doc = $anl->analyzeTypes($doc,\%types,\%opts);
$doc = $anl->analyzeSentences($doc,\%opts);
DESCRIPTION
Methods: Constructors etc.
- new
-
$obj = CLASS_OR_OBJ->new(%args)
Creates a new simple language-guesser object, which inherits from DTA::CAB::Analyzer::Dict::Json. Known options in %args:
##-- analysis selection label => 'lang', ##-- analyzer label defaultLang => 'de', ##-- default language (if e.g. known by 'morph') defaultCount => 0.1, ##-- bonus count for default lang (characters) minSentLen => 2, ##-- minimum number of tokens in sentence required before guessing minSentChars => 8, ##-- minimum number of text characters in sentence required begore guessing
Methods: Prepare
- ensureLoaded
-
$bool = $lid->ensureLoaded();
ensures analyzer data is loaded from default files.
Methods: Analysis: v1.x: API
- analyzeTypes
-
$doc = $anl->analyzeTypes($doc,\%types,\%opts);
perform type-wise analysis of all (text) types in $doc->{types}
- analyzeSentences
-
$doc = $anl->analyzeSentences($doc,\%opts);
perform sentence-wise analysis of all sentences in $doc->{body}.
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2013-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...