NAME
DTA::TokWrap::Processor::tokenize1 - DTA tokenizer wrappers: tokenizer post-processing
SYNOPSIS
use DTA::TokWrap::Processor::tokenize1;
$tp = DTA::TokWrap::Processor::tokenize1->new(%args);
$doc_or_undef = $tp->tokenize1($doc);
DESCRIPTION
DTA::TokWrap::Processor::tokenize1 provides an object-oriented DTA::TokWrap::Processor wrapper for post-processing of raw tokenizer output for DTA::TokWrap::Document objects.
Most users should use the high-level DTA::TokWrap wrapper class instead of using this module directly.
Constants
- @ISA
-
DTA::TokWrap::Processor::tokenize1 inherits from DTA::TokWrap::Processor.
Constructors etc.
- new
-
$tp = $CLASS_OR_OBJ->new(%args);
%args, %$tp:
fixtok => $bool, ##-- attempt to fix common tokenizer errors? (default=true) fixold => $bool, ##-- attempt to fix unexpected and/or obsolete (tomata2) errors? (default=false)
- defaults
-
%defaults = CLASS->defaults();
Static class-dependent defaults.
Methods
- tokenize1
-
$doc_or_undef = $CLASS_OR_OBJECT->tokenize1($doc);
Runs the low-level tokenizer on the serialized text from the DTA::TokWrap::Document object $doc.
Relevant %$doc keys:
tokdata0 => $tokdata0, ##-- (input) raw tokenizer output (string) tokdata1 => $tokdata1, ##-- (output) post-processed tokenizer output (string) tokenize1_stamp => $f, ##-- (output) timestamp of operation end tokdata1_stamp => $f, ##-- (output) timestamp of operation end
may implicitly call $doc->tokenize() (but shouldn't).
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2009-2018 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.