NAME
DTA::CAB::Format::Raw::HTTP - Document parser: raw untokenized text via HTTP tokenizer API
SYNOPSIS
use DTA::CAB::Format::Raw::HTTP;
##========================================================================
## Methods
$fmt = DTA::CAB::Format::Raw::HTTP->new(%args);
@keys = $class_or_obj->noSaveKeys();
$fmt = $fmt->close();
$fmt = $fmt->parseRawString(\$str);
$doc = $fmt->parseDocument();
$type = $fmt->mimeType();
$ext = $fmt->defaultExtension();
DESCRIPTION
DTA::CAB::Format::Raw::HTTP is an input-only DTA::CAB::Format subclass for untokenized raw string intput using LWP::UserAgent to query a tokenization server via HTTP.
Methods
- new
-
$fmt = CLASS_OR_OBJ->new(%args);
%$fmt, %args:
##-- Input doc => $doc, ##-- buffered input document tokurl => $url, ##-- tokenizer (default='http://kaskade.dwds.de/waste/tokenize.fcgi?m=dta&O=mr,loc') txtparam => $param, ##-- text query parameter (default='t') timeout => $secs, ##-- user agent timeout (default=300) ua => $agent, ##-- underlying LWP::UserAgent
- noSaveKeys
-
@keys = $class_or_obj->noSaveKeys();
Returns list of keys not to be saved Override returns qw(doc ua).
- close
-
$fmt = $fmt->close();
Deletes buffered input document, if any.
- fromString
-
$fmt = $fmt->fromString($string)
Select input from string $string.
- parseRawString
-
$fmt = $fmt->parseRawString(\$str);
Guts for fromString(): parse string $str into local document buffer.
- parseDocument
-
$doc = $fmt->parseDocument();
Wrapper for $fmt->{doc}.
- mimeType
-
$type = $fmt->mimeType();
Default returns text/plain.
- defaultExtension
-
$ext = $fmt->defaultExtension();
Returns default filename extension for this format, here '.raw'.
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2013-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dta-cab-convert.perl(1), DTA::CAB::Format::Builtin(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...