NAME
HTTP::Promise::Parser - Fast HTTP Request & Response Parser
SYNOPSIS
use HTTP::Promise::Parser;
my $p = HTTP::Promise::Parser->new ||
die( HTTP::Promise::Parser->error, "\n" );
my $ent = $p->parse( '/some/where/http_request.txt' ) ||
die( $p->error );
my $ent = $p->parse( $file_handle ) ||
die( $p->error );
my $ent = $p->parse( $string ) ||
die( $p->error );
VERSION
v0.1.0
DESCRIPTION
This is an http request and response parser using XS modules whenever posible for speed and mindful of memory consumption.
As rfc7230 states in its section 3:
"The normal procedure for parsing an HTTP message is to read the start-line into a structure, read each header field into a hash table by field name until the empty line, and then use the parsed data to determine if a message body is expected. If a message body has been indicated, then it is read as a stream until an amount of octets equal to the message body length is read or the connection is closed."
Thus, HTTP::Promise approach is to read the data, whether a HTTP request or response, a.k.a, an HTTP message, from a filehandle, possibly chunked, and to first read the message headers and parse them, then to store the HTTP message in memory if it is under a specified threshold, or in a file. If the size is unknown, it would be first read in memory and switched automatically to a file when it reaches the threshold.
Once the overall message body is stored, if it is a multipart type, this class reads each of its parts into memory or separate file depending on its size until there is no more part, using the stream reader, which reads in chunks of bytes and not in lines. If the message body is a single part it is saved to memory or file depending on its size. Each part saved on file uses a file extension related to its mime type. Each of the parts are then accessible as a HTTP body object via the "parts" in HTTP::Promise::Entity method.
Note, however, that when dealing with multipart, this only recognises multipart/form-data
, anything else will be treated as data.
The overall HTTP message is available as an HTTP::Promise::Entity object and returned.
If an error occurs, this module does not die, at least not voluntarily, but instead sets an error and returns undef
, so always make sure to check the returned value from method calls.
CONSTRUCTOR
new
This instantiates a new HTTP::Promise::Parser object.
It takes the following options:
decode_body
Boolean. If enabled, this will have this interface automatically decode the entity body upon parsing. Default is true.
decode_headers
Boolean. If enabled, this will decode headers, which is used for decoding filename value in
Content-Encoding
. Default is false.ignore_filename
Boolean. Wether the filename provided in an
Content-Disposition
should be ignored or not. This defaults to false, but actually, this is not used and the filename specified in aContent-Disposition
header field is never used. So, this is a no-op and should be removed.max_body_in_memory_size
Integer. This is the threshold beyond which an entity body that is initially loaded into memory will switched to be loaded into a file on the local filesystem when it is a true value and exceeds the amount specified.
By defaults, this has the value set by the class variable
$MAX_BODY_IN_MEMORY_SIZE
, which is 102400 bytes or 100Kmax_headers_size
Integer. This is the threshold size in bytes beyond which HTTP headers will trigger an error. This defaults to the class variable
$MAX_HEADERS_SIZE
, which itself is set by default to 8192 bytes or 8Kmax_read_buffer
Integer. This is the read buffer size. This is used for HTTP::Promise::IO and this defaults to 2048 bytes (2Kb).
output_dir
Filepath of the directory to be used to save entity body, when applicable.
tmp_dir
Set the directory to use when creating temporary files.
tmp_to_core
Boolean. When true, this will set the temporary file to an in-memory space.
METHODS
decode_body
Boolean. If enabled, this will have this interface automatically decode the entity body upon parsing. Default is true.
decode_headers
Boolean. If enabled, this will decode headers, which is used for decoding filename value in Content-Encoding
. Default is false.
ignore_filename
Boolean. Wether the filename provided in an Content-Disposition
should be ignored or not. This defaults to false, but actually, this is not used and the filename specified in a Content-Disposition
header field is never used. So, this is a no-op and should be removed.
looks_like_request
Provided with a string or a scalar reference, and this returns an hash reference containing details of the request line attributes if it is indeed a request, or an empty string if it is not a request.
It sets an error and returns undef
upon error.
The following attributes are available:
http_version
-
The HTTP protocol version used. For example, in
HTTP/1.1
, this would be1.1
, and inHTTP/2
, this would be2
. http_vers_minor
-
The HTTP protocol major version used. For example, in
HTTP/1.0
, this would be1
, and inHTTP/2
, this would be2
. http_vers_minor
-
The HTTP protocol minor version used. For example, in
HTTP/1.0
, this would be0
, and inHTTP/2
, this would beundef
. method
-
The HTTP request method used. For example in
GET / HTTP/1.1
, this would beGET
. This uses the rfc7231 semantics, which means any token even non-standard ones would match. protocol
-
The HTTP protocol used, e.g.
HTTP/1.0
,HTTP/1.1
,HTTP/2
, etc... uri
-
The request URI. For example in
GET / HTTP/1.1
, this would be/
my $ref = $p->looks_like_request( \$str );
# or
# my $ref = $p->looks_like_request( $str );
die( $p->error ) if( !defined( $ref ) );
if( $ref )
{
say "Request method $ref->{method}, uri $ref->{uri}, protocol $ref->{protocol}, version major $ref->{http_vers_major}, version minor $ref->{http_vers_minor}";
}
else
{
say "This is not an HTTP request.";
}
looks_like_response
Provided with a string or a scalar reference, and this returns an hash reference containing details of the response line attributes if it is indeed a response, or an empty string if it is not a response.
It sets an error and returns undef
upon error.
The following attributes are available:
code
-
The 3-digits HTTP response code. For example in
HTTP/1.1 200 OK
, this would be200
. http_version
-
The HTTP protocol version used. For example, in
HTTP/1.1
, this would be1.1
, and inHTTP/2
, this would be2
. http_vers_minor
-
The HTTP protocol major version used. For example, in
HTTP/1.0
, this would be1
, and inHTTP/2
, this would be2
. http_vers_minor
-
The HTTP protocol minor version used. For example, in
HTTP/1.0
, this would be0
, and inHTTP/2
, this would beundef
. protocol
-
The HTTP protocol used, e.g.
HTTP/1.0
,HTTP/1.1
,HTTP/2
, etc... status
-
The response status text. For example in
HTTP/1.1 200 OK
, this would beOK
.
my $ref = $p->looks_like_response( \$str );
# or
# my $ref = $p->looks_like_response( $str );
die( $p->error ) if( !defined( $ref ) );
if( $ref )
{
say "Response code $ref->{code}, status $ref->{status}, protocol $ref->{protocol}, version major $ref->{http_vers_major}, version minor $ref->{http_vers_minor}";
}
else
{
say "This is not an HTTP response.";
}
looks_like_what
Provided with a string or a scalar reference, and this returns an hash reference containing details of the HTTP message first line attributes if it is indeed an HTTP message.
The attributes available depends on the type of HTTP message determined and are described in details in "looks_like_request" and "looks_like_response". In addition to those, it also returns the attribute type
, which is a string representing the type of HTTP message this is, i.e. either request
or response
.
If this does not match either an HTTP request or HTTP response, it returns an empty string.
my $ref = $p->looks_like_what( \$str );
die( $p->error ) if( !defined( $ref ) );
say "This is a ", ( $ref ? $ref->{type} : 'unknown' ), " HTTP message.";
my $ref = $p->looks_like_what( \$str );
die( $p->error ) if( !defined( $ref ) );
if( !$ref )
{
say "This is unknown.";
}
else
{
say "This is a HTTP $ref->{type} with protocol version $ref->{http_version}";
}
max_body_in_memory_size
Integer. This is the threshold beyond which an entity body that is initially loaded into memory will switched to be loaded into a file on the local filesystem when it is a true value and exceeds the amount specified.
By defaults, this has the value set by the class variable $MAX_BODY_IN_MEMORY_SIZE
, which is 102400 bytes or 100K
max_headers_size
Integer. This is the threshold size in bytes beyond which HTTP headers will trigger an error. This defaults to the class variable $MAX_HEADERS_SIZE
, which itself is set by default to 8192 bytes or 8K
max_read_buffer
Integer. This is the read buffer size. This is used for HTTP::Promise::IO and this defaults to 2048 bytes (2Kb).
new_tmpfile
Creates a new temporary file. If tmp_to_core
is set to true, this will create a new file using a scalar object, or it will create a new temporary file under the directory set with the object parameter tmp_dir
. The filehandle binmode is set to raw
.
It returns a filehandle upon success, or upon error, it sets an error and return undef
.
output_dir
The filepath to the output directory. This is used when saving entity bodies on the filesystem.
parse
This takes a scalar reference of data, a glob or a file path, and will parse the HTTP request or response by calling "parse_fh" and pass it whatever options it received.
It returns an entity object upon success and upon error, it sets an error and return undef
.
parse_data
This takes a string or a scalar reference and returns an entity object upon success and upon error, it sets an error and return undef
parse_fh
This takes a filehandle and parse the HTTP request or response, and returns an entity object upon success and upon error, it sets an error and return undef
.
It takes also an hash or hash reference of the following options:
reader
An HTTP::Promise::IO. If this is not provided, a new one will be created. Note that data will be read using this reader.
request
Boolean. Set this to true to indicate the data is an HTTP request. If neither
request
norresponse
is provided, the parser will attempt guessing it.response
Boolean. Set this to true to indicate the data is an HTTP response. If neither
request
norresponse
is provided, the parser will attempt guessing it.
parse_headers
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the headers found in the given string, if any at all.
It returns an hash reference with the same property names and values returned by "parse_headers_xs".
This method uses pure perl.
Supported options are:
convert_dash
Boolean. If true, this will convert
-
in header fields to_
. Default is false.no_headers_ok
Boolean. If set to true, this won't trigger if there is no headers
parse_headers_xs
my $def = $p->parse_headers_xs( $http_request_or_response );
my $def = $p->parse_headers_xs( $http_request_or_response, $options_hash_ref );
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the headers found in the given string, if any at all.
It returns a dictionary as an hash reference upon success, and it sets an error with an http error code set and returns undef
upon error.
Supported options are:
convert_dash
Boolean. If true, this will convert
-
in header fields to_
. Default is false.request
Boolean. If true, this will parse the string assuming it is a request header.
response
Boolean. If true, this will parse the string assuming it is a response header.
The properties returned in the dictionary depend on whether request
or response
were enabled.
For request
:
headers
An HTTP::Promise::Headers object.
length
The length in bytes of the headers parsed.
method
The HTTP method such as
GET
, orHEAD
,POST
, etc.protocol
String, such as
HTTP/1.1
orHTTP/2
uri
String, the request URI, such as
/
version
This is a version object and contains a value such as
1.1
, so you can do something like:if( $def->{version} >= version->parse( '1.1' ) ) { # Do something }
For response
:
code
The HTTP status code, such as
200
headers
An HTTP::Promise::Headers object.
length
The length in bytes of the headers parsed. This is useful so you can then remove it from the string you provided:
my $resp = <<EOT; HTTP/1.1 200 OK Content-Type: text/plain Hello world! EOT my $def = $p->parse_headers_xs( \$resp, response => 1 ) || die( $p->error ); $str =~ /^\r?\n//; substr( $str, 0, $def->{length} ) = ''; # $str now contains the body, i.e.: "Hello world!\n"
status
String, the HTTP status, i.e. something like
OK
protocol
String, such as
HTTP/1.1
version
This is a version object and contains a value such as
1.1
, so you can do something like:if( $def->{version} >= version->parse( '1.1' ) ) { # Do something }
If not enough data was provided to parse the headers, this will return an error object with code set to 425
(Too early).
If the headers is incomplete and the cumulated size exceeds the value set with "max_headers_size", this returns an error object with code set to 413
(Request entity too large).
If there are other issues with the headers, this sets the error code to 400
(Bad request), and for any other error, this returns an error object without code.
parse_multi_part
This takes an hash or hash reference of options and parse an HTTP multipart portion of the HTTP request or response.
It returns an entity object upon success and upon error it sets an error object and returns undef
.
Supported options are:
entity
The HTTP::Property::Entity object to which this multipart belongs.
reader
The HTTP::Property::Reader used for reading the data chunks from the filehandle.
parse_open
Provided with a filepath, and this will open it in read mode, parse it and return an entity object.
If there is an error, this returns undef
and you can retrieve the error by calling "error" in Module::Generic which is inherited by this module.
parse_request
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the request found in the given string, including the header and the body.
It returns a dictionary as an hash reference upon success, and it sets an error with an http error code set and returns undef
upon error.
The properties returned are the same as the ones returned for a request
by "parse_headers_xs", and also sets the content
property containing the body data of the request.
Obviously this works well for simple request, i.e. not multipart ones, otherwise the entire body, whatever that is, will be stored in content
parse_request_headers
This is an alias and is equivalent to calling "parse_headers_xs" and setting the request
option.
parse_request_line
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and parse the reuqest line returning an hash reference containing 4 properties: method
, path
, protocol
, version
parse_request_pp
This is the same as "parse_request", except it uses the pure perl method "parse_headers" to parse the headers instead of the XS one.
parse_response
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the response found in the given string, including the header and the body.
It returns a dictionary as an hash reference upon success, and it sets an error with an http error code set and returns undef
upon error.
The properties returned are the same as the ones returned for a response
by "parse_headers_xs", and also sets the content
property containing the body data of the response.
parse_response_headers
This is an alias and is equivalent to calling "parse_headers_xs" and setting the response
option.
parse_response_line
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and parse the reuqest line returning an hash reference containing 4 properties: method
, path
, protocol
, version
parse_response_pp
This is the same as "parse_response", except it uses the pure perl method "parse_headers" to parse the headers instead of the XS one.
parse_singleton
Provided with an hash or hash reference of options and this parse a simple entity body.
It returns an entity object upon success and upon error it sets an error object and returns undef
.
Supported options are:
entity
The HTTP::Property::Entity object to which this multipart belongs.
read_until
A string or a regular expression that indicates the string up to which to read data from the filehandle.
reader
The HTTP::Property::Reader used for reading the data chunks from the filehandle.
parse_version
This takes an HTTP version string, such as HTTP/1.1
or HTTP/2
and returns its major and minor as a 2-elements array in list context, or just the version object in scalar context.
tmp_dir
Sets or gets the temporary directory to use when creating temporary files.
When set, this returns a file object
tmp_to_core
Boolean. When set to true, this will store data in memory rather than in a file on the filesystem.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
rfc6266 on Content-Disposition, rfc7230 on Message Syntax and Routing, rfc7231 on Semantics and Content, rfc7232 on Conditional Requests, rfc7233 on Range Requests, rfc7234 on Caching, rfc7235 on Authentication, rfc7578 on multipart/form-data, rfc7540 on HTTP/2.0
Mozilla documentation on HTTP protocol
Mozilla documentation on HTTP messages
HTTP::Promise, HTTP::Promise::Request, HTTP::Promise::Response, HTTP::Promise::Message, HTTP::Promise::Entity, HTTP::Promise::Headers, HTTP::Promise::Body, HTTP::Promise::Body::Form, HTTP::Promise::Body::Form::Data, HTTP::Promise::Body::Form::Field, HTTP::Promise::Status, HTTP::Promise::MIME, HTTP::Promise::Parser, HTTP::Promise::IO, HTTP::Promise::Stream, HTTP::Promise::Exception
COPYRIGHT & LICENSE
Copyright(c) 2022 DEGUEST Pte. Ltd.
All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.