NAME
DTA::CAB::Format::CorpusExplorerPlugin - Datum parser/formatter: CorpusExplorer normalization plugin
SYNOPSIS
##========================================================================
## PRELIMINARIES
##========================================================================
## Constructors etc.
$fmt
= CLASS_OR_OBJ->new(
%args
);
##========================================================================
## Methods: Persistence
@keys
=
$class_or_obj
->noSaveKeys();
##========================================================================
## Methods: Input: Input selection
$fmt
=
$fmt
->fromFh(
$filename_or_handle
);
$fmt
=
$fmt
->fromString(\
$string
);
##========================================================================
## Methods: Input: Local
$fmt
=
$fmt
->parseCeString(\
$string
);
##========================================================================
## Methods: Input: Generic API
$doc
=
$fmt
->parseDocument();
##========================================================================
## Methods: Output: Generic
$type
=
$fmt
->mimeType();
$ext
=
$fmt
->defaultExtension();
$fmt
=
$fmt
->toFh(
$fh
,
$level
)
##========================================================================
## Methods: Output: API
$fmt
=
$fmt
->putDocument(
$doc
);
$fmt
=
$fmt
->putData(
$data
);
DESCRIPTION
Globals
- Variable: @ISA
-
Inherits from DTA::CAB::Format.
Constructors etc.
- new
-
$fmt
= CLASS_OR_OBJ->new(
%args
);
object structure: assumed HASH
(
##---- Input
doc
=>
$doc
,
##-- buffered input document
##---- Output
level
=>
$formatLevel
,
##-- output formatting level:
## 0: norm (terse; empty for identity-normalizations)
## 1: norm (verbose)
##---- Common
utf8
=>
$bool
,
##-- default: 1
fh
=>
$fh
,
##-- IO::Handle for read/write
)
Methods: Persistence
- noSaveKeys
-
@keys
=
$class_or_obj
->noSaveKeys();
List of keys not to be saved; override returns
qw(doc outbuf)
.
Methods: Input: Input selection
- fromFh
-
$fmt
=
$fmt
->fromFh(
$filename_or_handle
);
override calls fromFh_str()
- fromString
-
$fmt
=
$fmt
->fromString(\
$string
);
select input from string $string
Methods: Input: Local
- parseCeString
-
$fmt
=
$fmt
->parseCeString(\
$string
);
Local parsing guts. Input is one sentence per line, sentence tokens (text only) separated by TABs.
Methods: Input: Generic API
Methods: Output: Generic
- mimeType
-
$type
=
$fmt
->mimeType();
override returns
text/plain
. - defaultExtension
-
$ext
=
$fmt
->defaultExtension();
returns default filename extension for this format; override returns
.ceplugin
. - toFh
-
$fmt_or_undef
=
$fmt
->toFh(
$fh
,
$formatLevel
);
Select output to filehandle
$fh
. Thin wrapper for DTA::CAB::Format::toFh.
Methods: Output: API
- putDocument
-
$fmt
=
$fmt
->putDocument(
$doc
);
Output guts. Output format is one sentence per line, sentence tokens ("canonical" / "modern" / "normalized" text only) separated by TABs. If
$fmt->{level}
is false (the default), tokens with identity canonicalizations (w_old == w_new
) will be written as the empty string. - putData
-
$fmt
=
$fmt
->putData(
$data
);
puts raw data (uses forceDocument())
AUTHOR
Bryan Jurish <jurish@bbaw.de>
COPYRIGHT AND LICENSE
Copyright (C) 2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.20.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...