NAME
DTA::TokWrap::Processor::tcfalign - DTA tokenizer wrappers: text<->token alignment for decoded TCF
SYNOPSIS
use DTA::TokWrap::Processor::tcfalign;
$aln = DTA::TokWrap::Processor::tcfalign->new(%opts);
$doc_or_undef = $aln->tcfalign($doc);
DESCRIPTION
DTA::TokWrap::Processor::tcfalign provides an object-oriented DTA::TokWrap::Processor wrapper for aligning tokens TCF-decoded tokens with TokWrap-serialized text. It requires GNU diff in your PATH.
Constants
- @ISA
-
DTA::TokWrap::Processor::tcfalign inherits from DTA::TokWrap::Processor.
Constructors etc.
- new
-
$obj = $CLASS_OR_OBJECT->new(%args);
Constructor.
- defaults
-
%defaults = $CLASS->defaults();
Static class-dependent defaults.
Methods
- tcfalign
-
$doc_or_undef = $CLASS_OR_OBJECT->tcfalign($doc);
Aligns the text and TCF-decoded tokens from the DTA::TokWrap::Document object's {txtdata} and {tcfwdata} keys, storing the resulting tokenization with byte offsets in TokWrap-compatible format to
$doc->{tokdata1}
.Relevant %$doc keys:
txtdata => $txtdata, ##-- (input) serialized text data (~ tcfxdata) tcfwdata => $tcfwdata, ##-- (input) tokenized data decoded from TCF, without byte-offsets, with SID/WID attributes ## tokdata1 => $tokdata1, ##-- (output) aligned token data, with byte-offsets, with SID/WID attributes tcfalign_stamp0 => $f, ##-- (output) timestamp of operation begin tcfalign_stamp => $f, ##-- (output) timestamp of operation end tokdata1_stamp => $f, ##-- (output) timestamp of operation end
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2014-2018 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.