NAME
DTA::CAB::Format::TJ - Datum parser: one-token-per-line text; token data as JSON
SYNOPSIS
use
DTA::CAB::Format::TJ;
##========================================================================
## Constructors etc.
$fmt
= DTA::CAB::Format::TJ->new(
%args
);
##========================================================================
## Methods: Input
$fmt
=
$fmt
->
close
();
$fmt
=
$fmt
->fromString(
$string
);
$doc
=
$fmt
->parseDocument();
##========================================================================
## Methods: Output
$fmt
=
$fmt
->flush();
$str
=
$fmt
->toString();
$fmt
=
$fmt
->putToken(
$tok
);
$fmt
=
$fmt
->putSentence(
$sent
);
$fmt
=
$fmt
->putDocument(
$doc
);
DESCRIPTION
Globals
- Variable: @ISA
-
DTA::CAB::Format::TJ inherits from DTA::CAB::Format::TT.
- Filenames
-
DTA::CAB::Format::TJ registers the filename regex:
/\.(?i:tj|cab-tj)$/
with DTA::CAB::Format.
Constructors etc.
- new
-
$fmt
= CLASS_OR_OBJ->new(
%args
);
%args, %$fmt:
##-- Input
doc
=>
$doc
,
##-- buffered input document
##
##-- Output
outbuf
=>
$stringBuffer
,
##-- buffered output
#level => $formatLevel, ##-- n/a
##
##-- Common
encoding
=>
$inputEncoding
,
##-- default: UTF-8, where applicable
Methods: Persistence
- noSaveKeys
-
@keys
=
$class_or_obj
->noSaveKeys();
Returns list of keys not to be saved. This implementation returns
qw(doc outbuf)
.
Methods: Input
- close
-
$fmt
=
$fmt
->
close
();
Override: close current input source, if any.
- fromString
-
$fmt
=
$fmt
->fromString(
$string
);
Override: select input from string $string.
- parseTJString
-
$fmt
=
$fmt
->parseTJString(
$str
)
Guts for fromString(): parse string $str into local document buffer $fmt->{doc}.
- parseDocument
-
$doc
=
$fmt
->parseDocument();
Override: just returns local document buffer $fmt->{doc}.
Methods: Output
- flush
-
$fmt
=
$fmt
->flush();
Override: flush accumulated output
- toString
-
$str
=
$fmt
->toString();
$str
=
$fmt
->toString(
$formatLevel
)
Override: flush buffered output document to byte-string. Just encodes string in $fmt->{outbuf}.
- putToken
-
$fmt
=
$fmt
->putToken(
$tok
);
Override: token output.
- putSentence
-
$fmt
=
$fmt
->putSentence(
$sent
);
Override: sentence output.
- putDocument
-
$fmt
=
$fmt
->putDocument(
$doc
);
Override: document output.
EXAMPLE
An example file in the format accepted/generated by this module (with very long lines) is:
%
%$TJ
:SENT={
"lang"
:
"de"
}
wie {
"errid"
:
"ec"
,
"hasmorph"
:
"1"
,
"msafe"
:
"1"
,
"moot"
:{
"word"
:
"wie"
,
"tag"
:
"PWAV"
,
"lemma"
:
"wie"
},
"exlex"
:
"wie"
,
"lang"
:[
"de"
],
"xlit"
:{
"latin1Text"
:
"wie"
,
"isLatin1"
:
"1"
,
"isLatinExt"
:
"1"
},
"text"
:
"wie"
}
oede {
"moot"
:{
"word"
:
"öde"
,
"tag"
:
"ADJD"
,
"lemma"
:
"öde"
},
"text"
:
"oede"
,
"xlit"
:{
"latin1Text"
:
"oede"
,
"isLatin1"
:
"1"
,
"isLatinExt"
:
"1"
},
"msafe"
:
"0"
}
! {
"errid"
:
"ec"
,
"exlex"
:
"!"
,
"msafe"
:
"1"
,
"xlit"
:{
"isLatin1"
:
"1"
,
"isLatinExt"
:
"1"
,
"latin1Text"
:
"!"
},
"text"
:
"!"
,
"moot"
:{
"word"
:
"!"
,
"tag"
:
"$."
,
"lemma"
:
"!"
}}
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2009-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 494:
Non-ASCII character seen before =encoding in '{"moot":{"word":"öde","tag":"ADJD","lemma":"öde"},"text":"oede","xlit":{"latin1Text":"oede","isLatin1":"1","isLatinExt":"1"},"msafe":"0"}'. Assuming UTF-8