NAME

DTA::CAB::Analyzer::TokPP::Waste - type-level heuristic token preprocessor (for punctuation etc) using Moot::Waste::Annotator

SYNOPSIS

##========================================================================
## PRELIMINARIES

use DTA::CAB::Analyzer::TokPP::Perl;

##========================================================================
## Methods

$obj = CLASS_OR_OBJ->new(%args);
$bool = $anl->ensureLoaded();
$doc = $tpp->analyzeTypes($doc,\%types,\%opts);

DESCRIPTION

DTA::CAB::Analyzer::TokPP::Waste provides a DTA::CAB::Analyzer interface to some simple text-based type-wise word analysis heuristics, e.g. for detection of punctutation, numeric strings, etc. It is implemented as a thin wrapper around the Moot::Waste::Annotator class.

Methods

new
$obj = CLASS_OR_OBJ->new(%args);

%$obj, %args:

label => $label,       ##-- analyzer label; default='tokpp'
ensureLoaded
$bool = $anl->ensureLoaded();

Ensures analysis data is loaded. Always returns 1.

Methods: Analysis

analyzeTypes
$doc = $tpp->analyzeTypes($doc,\%types,\%opts);

Perform type-wise analysis of all (text) types in values(%types). Override sets:

$tok->{$anl->{label}} = \@morphHiStrings

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2013-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), DTA::CAB::Analyzer(3pm), DTA::CAB::Chain(3pm), DTA::CAB(3pm), perl(1), ...