NAME
DTA::CAB::HttpProtocol - HTTP query protocol for use with DTA::CAB::Server::HTTP::Handler::Query
DESCRIPTION
This manual page describes the conventions used by the DTA::CAB::Server::HTTP::Handler::Query analysis request handler class for DTA::CAB::Server::HTTP servers. The examples in this manual page assume your DTA::CAB::Server::HTTP server object is running on $serverURL
(e.g. $serverURL="http://localhost:8080") with a DTA::CAB::Server::HTTP::Handler::Query handler object $qh bound to path /query
(i.e. $queryURL="$serverURL/query").
HEAD Requests
HEAD requests are honored by default, but don't currently return any useful information other than the fact that the server is (or is not) running.
/query/list Requests
Requests whose local path component ends in '/list' are interpreted as requests for a list of analyzer names (strings) supported by the query handler object. Such /list
requests may be GET or POST requests may contain the following form parameters:
- a => $regex
-
Requests the server to return only those analyzers whose names match the regular expression $regex.
- fmt => $fmt
-
Alias: format
Requests the server to return the list of analyzers in format $fmt, which should be a format alias supported by the DTA::CAB::Format::Registry object associated with the query handler ($qh->{formats}), which by default is simply the global format registry as used by DTA::CAB::Format::newFormat.
This use of formats constitutes an abuse of the DTA::CAB::Format API, since the returned data are really just a flat list of strings and not "document-like" at all, but it should suffice for most purposes.
The default format is 'TT', which returns a flat newline-separated list of analyzers names.
The list of analyzers is returned in the requested format as the response content.
/query Requests
All other requests are interpreted as a request for analysis of a user-supplied document by a handler-supported analyzer. /query
requests may be GET or POST requests. Query requests are specified in terms of form parameters, which may be passed either `GET'-like in the 'query' portion of the requested URL (the portion after the '?') or `POST'-like in the request content (if the request Content-Type is either 'application/x-www-form-urlencoded' or 'multipart/form-data'), or a combination of both (in which case GET-like parameters take precedence over POST parameters). Additionally, the DTA::CAB::Server::HTTP class supports so-called "xpost" requests, which are nothing more than POST requests with a 'Content-Type' header other than 'application/x-www-form-urlencoded' or 'multipart/form-data', in which all `option-like' query parameters are assumed to be encoded in the URL (as for typical GET requests), and the POST content itself is interpreted as the value of the 'qd' (query document) parameter.
Query Parameters
- a => $a
-
Name of the analyzer (string) to be queried, which must be an analyzer name as returned by a "/query/list" request; see "/query/list Requests" for details.
If unspecified, the query handler may choose a default analyzer. By convention, this analyzer is accessible as $a='default'.
- q => $rawQueryString
-
A raw untokenized text string to be tokenized with DTA::CAB::Format::Raw and analyzed.
Each query must specify either a 'q' parameter, a 'qd' parameter, or contain raw POST content representing the formatted document to be queried.
- qd => $queryDocument
-
A DTA::CAB::Document in the format specified by the 'fmt' parameter.
If $queryDocument does not conform to the structural conventions of a DTA::CAB::Document (i.e. {body=>[ {tokens=>[ {text=>$t1}, ... ]}, ... ]}), an attempt will be made to heuristically massage it into one; see DTA::CAB::Format::forceDocument().
Each query must specify either a 'q' parameter, a 'qd' parameter, or contain raw POST content representing the formatted document to be queried.
- fmt => $fmt
-
Aliases: format ifmt
Specifies the input format of the document passed in via the "qd" parameter or POST request content (see below). $fmt should be a DTA::CAB::Format alias known to the DTA::CAB::Format::Registry object associated with the query handler ($qh->{formats}), which by default is simply the global format registry as used by DTA::CAB::Format::newFormat. See "SUBCLASSES" in DTA::CAB::Format for a list of known formats and aliases.
The default format is given by the query handler's 'defaultFormat' field, which by default is set to $DTA::CAB::Format::CLASS_DEFAULT (usually 'TT').
- ofmt => $ofmt
-
Specifies the output format to be used for returning the analyzed document in file-upload mode. If unspecified, $ofmt defaults to the input format $fmt. See "SUBCLASSES" in DTA::CAB::Format for a list of known formats and aliases.
- raw => $bool
-
If specified true, a successful query response will be returned with
Content-Type: text/plain
and without aContent-Disposition
header, regardless of the chosen $fmt. Otherwise, theContent-Type
andContent-Disposition
headers are determined by the specified $fmt. - pretty => $level
-
Whether to pretty-print the response data. Really just a wrapper for $fmt->{level}.
- $analyzerOption => $value
-
Any other form parameter is interpreted as an option to be passed to the analyzeDocument method of the selected analyzer. Note that the query handler $qh can block propagation of some or all analyzer options by means of its 'allowUserOptions' attribute.
- file => $filename
-
File basename to use for response in attachment-mode. Defaults to $rawQueryString or 'data'.
Query Responses
If an error occurred during analysis, an error response should be returned to the client. Otherwise, the response has 'Content-Type' and 'Content-Disposition' headers as set by the selected format $fmt, and the response content is a string representing the analyzed document as output by $fmt->toString().
Response Caching
As of DTA::CAB version 1.16, DTA::CAB::Server::HTTP supports basic in-memory caching of responses for GET requests. Client requests may set the Cache-Control: no-cache
header option to prevent the server from returning a cached response even if one is available. Similarly, if a client request includes a Cache-Control: no-store
header option, the server will not cache its generated response (although this option has no effect on a previously cached response for the same URI, if one exists). If the response returned by the server was drawn from the server-internal cache, it will contain the header X-Cached: 1
.
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2011-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.