NAME
DTA::TokWrap::Processor::tcftokenize - DTA tokenizer wrappers: TCF text layer tokenization
SYNOPSIS
use DTA::TokWrap::Processor::tcftokenize;
$ttok = DTA::TokWrap::Processor::tcftokenize->new(%opts);
$doc_or_undef = $ttok->tcftokenize($doc);
DESCRIPTION
DTA::TokWrap::Processor::tcftokenize provides an object-oriented DTA::TokWrap::Processor wrapper for tokenizing the TCF text
layer with the selected tokenizer and encoding the result in the TCF tokens
and sentences
layers.
Constants
- @ISA
-
DTA::TokWrap::Processor::tcftokenize inherits from DTA::TokWrap::Processor.
Constructors etc.
- new
-
$obj = $CLASS_OR_OBJECT->new(%args);
Constructor.
- defaults
-
%defaults = $CLASS->defaults();
Static class-dependent defaults.
Methods
- tcftokenize
-
$doc_or_undef = $CLASS_OR_OBJECT->tcftokenize($doc);
Tokenizes the
text
layer extracted from a TCF document and encodes the result in new TCFtokens
andsentences
layers.Relevant %$doc keys:
tcfdoc => $tcfdoc, ##-- (input,output) TCF input document with <text> layer ## txtfile => $txtfile, ##-- (temp,output) text file used for TCF extraction tokdata0 => $tokdata, ##-- (temp,output) raw tokenization data tokdata1 => $tokdata1, ##-- (temp,output) tweaked tokenization data ## tcftokdoc => $tcftokdoc, ##-- (output) output TCF file with <sentences>,<tokens> layers (==$tcfdoc) tcftokenize_stamp0 => $f, ##-- (output) timestamp of operation begin tcftokenize_stamp => $f, ##-- (output) timestamp of operation end tcftokdoc_stamp => $f, ##-- (output) timestamp of operation end
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2014-2018 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.