NAME
DTA::CAB::Format::CSV1g - Datum I/O: concise minimal-output human-readable text, unigrams
SYNOPSIS
use DTA::CAB::Format::CSV1g;
##========================================================================
## Methods: Constructors etc.
$fmt = CLASS_OR_OBJ->new(%args)
##========================================================================
## Methods: Input
$fmt = $fmt->parseCsvString($string);
##========================================================================
## Methods: Output
$type = $fmt->mimeType();
$ext = $fmt->defaultExtension();
$fmt = $fmt->putToken($tok);
DESCRIPTION
DTA::CAB::Format::CSV1g is a DTA::CAB::Format subclass for representing the minimal "interesting" results of a DTA::CAB::Chain::DTA canonicalization in a (more or less) human- and machine-friendly TAB-separated format, including unigram counts. As for DTA::CAB::Format::TT (from which this class inherits), each token is represented by a single line and sentence boundaries are represented by blank lines. Token lines have the format:
FREQ OLD_TEXT XLIT_TEXT NEW_TEXT POS_TAG LEMMA ?DETAILS
Methods: Constructors etc.
- new
-
$fmt = CLASS_OR_OBJECT->new(%args);
Recognized %args:
##---- Input doc => $doc, ##-- buffered input document ##---- Output level => $formatLevel, ##-- output formatting level: ## 0: text, xlit, canon, tag, lemma ## 1: text, xlit, canon, tag, lemma, details #outbuf => $stringBuffer, ##-- buffered output ##---- Common utf8 => $bool, ##-- default: 1
Methods: Input: Local
- parseCsvString
-
$fmt = $fmt->parseCsvString($string);
Hack which converts a CSV string to a TT string and passes it to DTA::CAB::Format::TT::parseTTString().
Methods: Output
- mimeType
-
$type = $fmt->mimeType();
Default returns text/plain.
- defaultExtension
-
$ext = $fmt->defaultExtension();
Deturns default filename extension for this format. Override returns '.csv.1g'.
- putToken
-
$fmt = $fmt->putToken($tok);
Appends $tok to output buffer.
EXAMPLE
An example file in the format accepted/generated by this module is:
1 wie wie wie PWAV wie
1 oede oede öde ADJD öde
1 ! ! ! $. !
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2014-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dta-cab-analyze.perl(1), dta-cab-convert.perl(1), DTA::CAB::Format::TT(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 247:
Non-ASCII character seen before =encoding in 'öde'. Assuming UTF-8