NAME
HTTP::Promise::Parser - Fast HTTP Request & Response Parser
SYNOPSIS
use HTTP::Promise::Parser;
my $p = HTTP::Promise::Parser->new || 
    die( HTTP::Promise::Parser->error, "\n" );
my $ent = $p->parse( '/some/where/http_request.txt' ) ||
    die( $p->error );
my $ent = $p->parse( $file_handle ) ||
    die( $p->error );
my $ent = $p->parse( $string ) ||
    die( $p->error );VERSION
v0.2.0DESCRIPTION
This is an http request and response parser using XS modules whenever posible for speed and mindful of memory consumption.
As rfc7230 states in its section 3:
"The normal procedure for parsing an HTTP message is to read the start-line into a structure, read each header field into a hash table by field name until the empty line, and then use the parsed data to determine if a message body is expected. If a message body has been indicated, then it is read as a stream until an amount of octets equal to the message body length is read or the connection is closed."
Thus, HTTP::Promise approach is to read the data, whether a HTTP request or response, a.k.a, an HTTP message, from a filehandle, possibly chunked, and to first read the message headers and parse them, then to store the HTTP message in memory if it is under a specified threshold, or in a file. If the size is unknown, it would be first read in memory and switched automatically to a file when it reaches the threshold.
Once the overall message body is stored, if it is a multipart type, this class reads each of its parts into memory or separate file depending on its size until there is no more part, using the stream reader, which reads in chunks of bytes and not in lines. If the message body is a single part it is saved to memory or file depending on its size. Each part saved on file uses a file extension related to its mime type. Each of the parts are then accessible as a HTTP body object via the "parts" in HTTP::Promise::Entity method.
Note, however, that when dealing with multipart, this only recognises multipart/form-data, anything else will be treated as data.
The overall HTTP message is available as an HTTP::Promise::Entity object and returned.
If an error occurs, this module does not die, at least not voluntarily, but instead sets an error and returns undef, so always make sure to check the returned value from method calls.
CONSTRUCTOR
new
This instantiates a new HTTP::Promise::Parser object.
It takes the following options:
- decode_body- Boolean. If enabled, this will have this interface automatically decode the entity body upon parsing. Default is true. 
- decode_headers- Boolean. If enabled, this will decode headers, which is used for decoding filename value in - Content-Encoding. Default is false.
- ignore_filename- Boolean. Wether the filename provided in an - Content-Dispositionshould be ignored or not. This defaults to false, but actually, this is not used and the filename specified in a- Content-Dispositionheader field is never used. So, this is a no-op and should be removed.
- max_body_in_memory_size- Integer. This is the threshold beyond which an entity body that is initially loaded into memory will switched to be loaded into a file on the local filesystem when it is a true value and exceeds the amount specified. - By defaults, this has the value set by the class variable - $MAX_BODY_IN_MEMORY_SIZE, which is 102400 bytes or 100K
- max_headers_size- Integer. This is the threshold size in bytes beyond which HTTP headers will trigger an error. This defaults to the class variable - $MAX_HEADERS_SIZE, which itself is set by default to 8192 bytes or 8K
- max_read_buffer- Integer. This is the read buffer size. This is used for HTTP::Promise::IO and this defaults to 2048 bytes (2Kb). 
- output_dir- Filepath of the directory to be used to save entity body, when applicable. 
- tmp_dir- Set the directory to use when creating temporary files. 
- tmp_to_core- Boolean. When true, this will set the temporary file to an in-memory space. 
METHODS
decode_body
Boolean. If enabled, this will have this interface automatically decode the entity body upon parsing. Default is true.
decode_headers
Boolean. If enabled, this will decode headers, which is used for decoding filename value in Content-Encoding. Default is false.
ignore_filename
Boolean. Wether the filename provided in an Content-Disposition should be ignored or not. This defaults to false, but actually, this is not used and the filename specified in a Content-Disposition header field is never used. So, this is a no-op and should be removed.
looks_like_request
Provided with a string or a scalar reference, and this returns an hash reference containing details of the request line attributes if it is indeed a request, or an empty string if it is not a request.
It sets an error and returns undef upon error.
The following attributes are available:
- http_version
- 
The HTTP protocol version used. For example, in HTTP/1.1, this would be1.1, and inHTTP/2, this would be2.
- http_vers_minor
- 
The HTTP protocol major version used. For example, in HTTP/1.0, this would be1, and inHTTP/2, this would be2.
- http_vers_minor
- 
The HTTP protocol minor version used. For example, in HTTP/1.0, this would be0, and inHTTP/2, this would beundef.
- method
- 
The HTTP request method used. For example in GET / HTTP/1.1, this would beGET. This uses the rfc7231 semantics, which means any token even non-standard ones would match.
- protocol
- 
The HTTP protocol used, e.g. HTTP/1.0,HTTP/1.1,HTTP/2, etc...
- uri
- 
The request URI. For example in GET / HTTP/1.1, this would be/
my $ref = $p->looks_like_request( \$str );
# or
# my $ref = $p->looks_like_request( $str );
die( $p->error ) if( !defined( $ref ) );
if( $ref )
{
    say "Request method $ref->{method}, uri $ref->{uri}, protocol $ref->{protocol}, version major $ref->{http_vers_major}, version minor $ref->{http_vers_minor}";
}
else
{
    say "This is not an HTTP request.";
}looks_like_response
Provided with a string or a scalar reference, and this returns an hash reference containing details of the response line attributes if it is indeed a response, or an empty string if it is not a response.
It sets an error and returns undef upon error.
The following attributes are available:
- code
- 
The 3-digits HTTP response code. For example in HTTP/1.1 200 OK, this would be200.
- http_version
- 
The HTTP protocol version used. For example, in HTTP/1.1, this would be1.1, and inHTTP/2, this would be2.
- http_vers_minor
- 
The HTTP protocol major version used. For example, in HTTP/1.0, this would be1, and inHTTP/2, this would be2.
- http_vers_minor
- 
The HTTP protocol minor version used. For example, in HTTP/1.0, this would be0, and inHTTP/2, this would beundef.
- protocol
- 
The HTTP protocol used, e.g. HTTP/1.0,HTTP/1.1,HTTP/2, etc...
- status
- 
The response status text. For example in HTTP/1.1 200 OK, this would beOK.
my $ref = $p->looks_like_response( \$str );
# or
# my $ref = $p->looks_like_response( $str );
die( $p->error ) if( !defined( $ref ) );
if( $ref )
{
    say "Response code $ref->{code}, status $ref->{status}, protocol $ref->{protocol}, version major $ref->{http_vers_major}, version minor $ref->{http_vers_minor}";
}
else
{
    say "This is not an HTTP response.";
}looks_like_what
Provided with a string or a scalar reference, and this returns an hash reference containing details of the HTTP message first line attributes if it is indeed an HTTP message.
The attributes available depends on the type of HTTP message determined and are described in details in "looks_like_request" and "looks_like_response". In addition to those, it also returns the attribute type, which is a string representing the type of HTTP message this is, i.e. either request or response.
If this does not match either an HTTP request or HTTP response, it returns an empty string.
my $ref = $p->looks_like_what( \$str );
die( $p->error ) if( !defined( $ref ) );
say "This is a ", ( $ref ? $ref->{type} : 'unknown' ), " HTTP message.";
my $ref = $p->looks_like_what( \$str );
die( $p->error ) if( !defined( $ref ) );
if( !$ref )
{
    say "This is unknown.";
}
else
{
    say "This is a HTTP $ref->{type} with protocol version $ref->{http_version}";
}max_body_in_memory_size
Integer. This is the threshold beyond which an entity body that is initially loaded into memory will switched to be loaded into a file on the local filesystem when it is a true value and exceeds the amount specified.
By defaults, this has the value set by the class variable $MAX_BODY_IN_MEMORY_SIZE, which is 102400 bytes or 100K
max_headers_size
Integer. This is the threshold size in bytes beyond which HTTP headers will trigger an error. This defaults to the class variable $MAX_HEADERS_SIZE, which itself is set by default to 8192 bytes or 8K
max_read_buffer
Integer. This is the read buffer size. This is used for HTTP::Promise::IO and this defaults to 2048 bytes (2Kb).
new_tmpfile
Creates a new temporary file. If tmp_to_core is set to true, this will create a new file using a scalar object, or it will create a new temporary file under the directory set with the object parameter tmp_dir. The filehandle binmode is set to raw.
It returns a filehandle upon success, or upon error, it sets an error and return undef.
output_dir
The filepath to the output directory. This is used when saving entity bodies on the filesystem.
parse
This takes a scalar reference of data, a glob or a file path, and will parse the HTTP request or response by calling "parse_fh" and pass it whatever options it received.
It returns an entity object upon success and upon error, it sets an error and return undef.
parse_data
This takes a string or a scalar reference and returns an entity object upon success and upon error, it sets an error and return undef
parse_fh
This takes a filehandle and parse the HTTP request or response, and returns an entity object upon success and upon error, it sets an error and return undef.
It takes also an hash or hash reference of the following options:
- reader- An HTTP::Promise::IO. If this is not provided, a new one will be created. Note that data will be read using this reader. 
- request- Boolean. Set this to true to indicate the data is an HTTP request. If neither - requestnor- responseis provided, the parser will attempt guessing it.
- response- Boolean. Set this to true to indicate the data is an HTTP response. If neither - requestnor- responseis provided, the parser will attempt guessing it.
parse_headers
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the headers found in the given string, if any at all.
It returns an hash reference with the same property names and values returned by "parse_headers_xs".
This method uses pure perl.
Supported options are:
- convert_dash- Boolean. If true, this will convert - -in header fields to- _. Default is false.
- no_headers_ok- Boolean. If set to true, this won't trigger if there is no headers 
parse_headers_xs
my $def = $p->parse_headers_xs( $http_request_or_response );
my $def = $p->parse_headers_xs( $http_request_or_response, $options_hash_ref );This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the headers found in the given string, if any at all.
It returns a dictionary as an hash reference upon success, and it sets an error with an http error code set and returns undef upon error.
Supported options are:
- convert_dash- Boolean. If true, this will convert - -in header fields to- _. Default is false.
- request- Boolean. If true, this will parse the string assuming it is a request header. 
- response- Boolean. If true, this will parse the string assuming it is a response header. 
The properties returned in the dictionary depend on whether request or response were enabled.
For request:
- headers- An HTTP::Promise::Headers object. 
- length- The length in bytes of the headers parsed. 
- method- The HTTP method such as - GET, or- HEAD,- POST, etc.
- protocol- String, such as - HTTP/1.1or- HTTP/2
- uri- String, the request URI, such as - /
- version- This is a version object and contains a value such as - 1.1, so you can do something like:- if( $def->{version} >= version->parse( '1.1' ) ) { # Do something }
For response:
- code- The HTTP status code, such as - 200
- headers- An HTTP::Promise::Headers object. 
- length- The length in bytes of the headers parsed. This is useful so you can then remove it from the string you provided: - my $resp = <<EOT; HTTP/1.1 200 OK Content-Type: text/plain Hello world! EOT my $def = $p->parse_headers_xs( \$resp, response => 1 ) || die( $p->error ); $str =~ /^\r?\n//; substr( $str, 0, $def->{length} ) = ''; # $str now contains the body, i.e.: "Hello world!\n"
- status- String, the HTTP status, i.e. something like - OK
- protocol- String, such as - HTTP/1.1
- version- This is a version object and contains a value such as - 1.1, so you can do something like:- if( $def->{version} >= version->parse( '1.1' ) ) { # Do something }
If not enough data was provided to parse the headers, this will return an error object with code set to 425 (Too early).
If the headers is incomplete and the cumulated size exceeds the value set with "max_headers_size", this returns an error object with code set to 413 (Request entity too large).
If there are other issues with the headers, this sets the error code to 400 (Bad request), and for any other error, this returns an error object without code.
parse_multi_part
This takes an hash or hash reference of options and parse an HTTP multipart portion of the HTTP request or response.
It returns an entity object upon success and upon error it sets an error object and returns undef.
Supported options are:
- entity- The HTTP::Property::Entity object to which this multipart belongs. 
- reader- The HTTP::Property::Reader used for reading the data chunks from the filehandle. 
parse_open
Provided with a filepath, and this will open it in read mode, parse it and return an entity object.
If there is an error, this returns undef and you can retrieve the error by calling "error" in Module::Generic which is inherited by this module.
parse_request
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the request found in the given string, including the header and the body.
It returns a dictionary as an hash reference upon success, and it sets an error with an http error code set and returns undef upon error.
The properties returned are the same as the ones returned for a request by "parse_headers_xs", and also sets the content property containing the body data of the request.
Obviously this works well for simple request, i.e. not multipart ones, otherwise the entire body, whatever that is, will be stored in content
parse_request_headers
This is an alias and is equivalent to calling "parse_headers_xs" and setting the request option.
parse_request_line
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and parse the reuqest line returning an hash reference containing 4 properties: method, path, protocol, version
parse_request_pp
This is the same as "parse_request", except it uses the pure perl method "parse_headers" to parse the headers instead of the XS one.
parse_response
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the response found in the given string, including the header and the body.
It returns a dictionary as an hash reference upon success, and it sets an error with an http error code set and returns undef upon error.
The properties returned are the same as the ones returned for a response by "parse_headers_xs", and also sets the content property containing the body data of the response.
parse_response_headers
This is an alias and is equivalent to calling "parse_headers_xs" and setting the response option.
parse_response_line
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and parse the reuqest line returning an hash reference containing 4 properties: method, path, protocol, version
parse_response_pp
This is the same as "parse_response", except it uses the pure perl method "parse_headers" to parse the headers instead of the XS one.
parse_singleton
Provided with an hash or hash reference of options and this parse a simple entity body.
It returns an entity object upon success and upon error it sets an error object and returns undef.
Supported options are:
- entity- The HTTP::Property::Entity object to which this multipart belongs. 
- read_until- A string or a regular expression that indicates the string up to which to read data from the filehandle. 
- reader- The HTTP::Property::Reader used for reading the data chunks from the filehandle. 
parse_version
This takes an HTTP version string, such as HTTP/1.1 or HTTP/2 and returns its major and minor as a 2-elements array in list context, or just the version object in scalar context.
tmp_dir
Sets or gets the temporary directory to use when creating temporary files.
When set, this returns a file object
tmp_to_core
Boolean. When set to true, this will store data in memory rather than in a file on the filesystem.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
rfc6266 on Content-Disposition, rfc7230 on Message Syntax and Routing, rfc7231 on Semantics and Content, rfc7232 on Conditional Requests, rfc7233 on Range Requests, rfc7234 on Caching, rfc7235 on Authentication, rfc7578 on multipart/form-data, rfc7540 on HTTP/2.0
Mozilla documentation on HTTP protocol
Mozilla documentation on HTTP messages
HTTP::Promise, HTTP::Promise::Request, HTTP::Promise::Response, HTTP::Promise::Message, HTTP::Promise::Entity, HTTP::Promise::Headers, HTTP::Promise::Body, HTTP::Promise::Body::Form, HTTP::Promise::Body::Form::Data, HTTP::Promise::Body::Form::Field, HTTP::Promise::Status, HTTP::Promise::MIME, HTTP::Promise::Parser, HTTP::Promise::IO, HTTP::Promise::Stream, HTTP::Promise::Exception
COPYRIGHT & LICENSE
Copyright(c) 2022 DEGUEST Pte. Ltd.
All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.