NAME

DTA::CAB::Format::Raw::HTTP - Document parser: raw untokenized text via HTTP tokenizer API

SYNOPSIS

use DTA::CAB::Format::Raw::HTTP;

##========================================================================
## Methods

$fmt = DTA::CAB::Format::Raw::HTTP->new(%args);
@keys = $class_or_obj->noSaveKeys();
$fmt = $fmt->close();
$fmt = $fmt->parseRawString(\$str);
$doc = $fmt->parseDocument();
$type = $fmt->mimeType();
$ext = $fmt->defaultExtension();

DESCRIPTION

DTA::CAB::Format::Raw::HTTP is an input DTA::CAB::Format subclass for untokenized raw string intput using LWP::UserAgent to query a tokenization server via HTTP. It uses DTA::CAB::Format::Raw::Base for output.

Methods

new
$fmt = CLASS_OR_OBJ->new(%args);

%$fmt, %args:

##-- Input
doc       => $doc,      ##-- buffered input document
tokurl    => $url,      ##-- tokenizer (default='http://kaskade.dwds.de/waste/tokenize.fcgi?m=dta&O=mr,loc')
txtparam  => $param,    ##-- text query parameter (default='t')
timeout   => $secs,     ##-- user agent timeout (default=300)
ua        => $agent,    ##-- underlying LWP::UserAgent
noSaveKeys
@keys = $class_or_obj->noSaveKeys();

Returns list of keys not to be saved Override returns qw(doc ua).

close
$fmt = $fmt->close();

Deletes buffered input document, if any.

fromString
$fmt = $fmt->fromString($string)

Select input from string $string.

parseRawString
$fmt = $fmt->parseRawString(\$str);

Guts for fromString(): parse string $str into local document buffer.

parseDocument
$doc = $fmt->parseDocument();

Wrapper for $fmt->{doc}.

mimeType
$type = $fmt->mimeType();

Default returns text/plain.

defaultExtension
$ext = $fmt->defaultExtension();

Returns default filename extension for this format, here '.raw'.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2013-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-convert.perl(1), DTA::CAB::Format::Builtin(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...