NAME
DTA::CAB::Format::Raw::HTTP - Document parser: raw untokenized text via HTTP tokenizer API
SYNOPSIS
##========================================================================
## Methods
$fmt
= DTA::CAB::Format::Raw::HTTP->new(
%args
);
@keys
=
$class_or_obj
->noSaveKeys();
$fmt
=
$fmt
->
close
();
$fmt
=
$fmt
->parseRawString(\
$str
);
$doc
=
$fmt
->parseDocument();
$type
=
$fmt
->mimeType();
$ext
=
$fmt
->defaultExtension();
DESCRIPTION
DTA::CAB::Format::Raw::HTTP is an input DTA::CAB::Format subclass for untokenized raw string intput using LWP::UserAgent
to query a tokenization server via HTTP. It uses DTA::CAB::Format::Raw::Base for output.
Methods
- new
-
$fmt
= CLASS_OR_OBJ->new(
%args
);
%$fmt, %args:
##-- Input
doc
=>
$doc
,
##-- buffered input document
tokurl
=>
$url
,
##-- tokenizer (default='http://kaskade.dwds.de/waste/tokenize.fcgi?m=dta&O=mr,loc')
txtparam
=>
$param
,
##-- text query parameter (default='t')
timeout
=>
$secs
,
##-- user agent timeout (default=300)
ua
=>
$agent
,
##-- underlying LWP::UserAgent
- noSaveKeys
-
@keys
=
$class_or_obj
->noSaveKeys();
Returns list of keys not to be saved Override returns qw(doc ua).
- close
-
$fmt
=
$fmt
->
close
();
Deletes buffered input document, if any.
- fromString
-
$fmt
=
$fmt
->fromString(
$string
)
Select input from string $string.
- parseRawString
-
$fmt
=
$fmt
->parseRawString(\
$str
);
Guts for fromString(): parse string $str into local document buffer.
- parseDocument
-
$doc
=
$fmt
->parseDocument();
Wrapper for $fmt->{doc}.
- mimeType
-
$type
=
$fmt
->mimeType();
Default returns text/plain.
- defaultExtension
-
$ext
=
$fmt
->defaultExtension();
Returns default filename extension for this format, here '.raw'.
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2013-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dta-cab-convert.perl(1), DTA::CAB::Format::Builtin(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...