NAME

Net::Inspect::L7::HTTP - guesses and handles HTTP traffic

SYNOPSIS

my $req = Net::Inspect::L7::HTTP::Request::Simple->new(..);
my $http = Net::Inspect::L7::HTTP->new($req);
my $guess = Net::Inspect::L5::GuessProtocol->new;
$guess->attach($http);
...

DESCRIPTION

This class extracts HTTP requests from TCP connections. It provides all hooks required for Net::Inspect::L4::TCP and is usually used together with it. It provides the guess_protocol hook so it can be used with Net::Inspect::L5::GuessProtocol.

Attached flow is usually a Net::Inspect::L7::HTTP::Request::* object.

Hooks provided:

guess_protocol($guess,$dir,$data,$eof,$time,$meta)
new_connection($meta,%args)

This returns an object for the connection. With $args{header_maxsize} the maximum size of the message headers can be given, that is:

$args{header_maxsize}[0] - request header, default 64k
$args{header_maxsize}[1] - response header, default 16k
$args{header_maxsize}[2] - chunked header, default 2k
$connection->in($dir,$data,$eof,$time)

Processes new data and returns number of bytes processed. Any data not processed must be sent again with the next call.

$data are the data as string. In some cases $data can be [ 'gap' => $len ], e.g. only the information, that there would be $len bytes of data w/o submitting the data. These should only be submitted in request and response bodies and only if the attached layer can handle these gaps in the in_request_body and in_response_body methods.

Gaps on other places are not allowed, because all other data are needed for interpreting the placement of request, response and data inside the connection.

$connection->fatal($reason,$dir,$time)

Hooks called:

new_request(\%meta,$conn)

This should return an request object. The reference to the connection object is given in case the request object likes to call fatal to end the connection.

The function should not get hold of $conn, e.g. only store a weak reference, otherwise memory might leak.

$request->in_request_header($header,$time,\%hdr_meta)

Called when the full request header is read. $header is the string of the header.

%hdr_meta contains information extracted from the header:

method - method of request
url - url, as given in request
version - version of HTTP spoken in request
info - first line of request (method url version)
fields - (key => \@values) hash of header fields
junk - invalid data found in header fields part
content_length - length of request body
chunked - true if body uses transfer encoding chunked
upgrade - contains hash when protocol upgrade was requested

Currently this hash contains the key websocket with the value of the sec-websocket-key if a valid request for a Websocket upgrade was detected.

expect - contains hash for expectations from Expect header

Currently the only possible key is 100-continue.

$request->in_response_header($header,$time,\%hdr_meta)

Called when the full response header is read. $header is the string of the header.

%hdr_meta contains information extracted from the header:

version - version of HTTP spoken in response
code - status code from response
reason - reason given for response code
fields - (key => \@values) hash of header fields
junk - invalid data found in header fields part
content_length - length of request body if known, else undef
chunked - true if body uses transfer encoding chunked
upgrade - new protocol when switching protocols, e.g. 'websocket'
preliminary - true if this is a preliminary response
$request->in_request_body($data,$eobody,$time)

Called for a chunk of data of the request body. $eobody is true if this is the last chunk of the request body. If the request body is empty the method will be called once with ''. If no body exists because of CONNECT or HTTP Upgrade in_data or the websocket functions will be called, not in_request_body.

$data can be [ 'gap' => $len ] if the input to this layer were gaps.

$request->in_response_body($data,$eobody,$time)

Called for a chunk of data of the response body. $eof is true if this is the last chunk of the connection. $eobody is true if this is the last chunk of the response body. If the response body is empty the method will be called once with ''. If no body exists because of CONNECT or HTTP Upgrade in_data or the websocket functions will be called, not in_response_body.

$data can be [ 'gap' => $len ] if the input to this layer were gaps.

$request->in_chunk_header($dir,$header,$time)

will be called with the chunk header for chunked encoding. Usually one is not interested in the chunk framing, only in the content so that this method will be empty. Will be called before the chunk data.

$request->in_chunk_trailer($dir,$trailer,$time)

will be called with the chunk trailer for chunked encoding. Usually one is not interested in the chunk framing, only in the content so that this method will be empty. Will be called after in_response_body/in_request_body got called with eof true.

$request->in_data($dir,$data,$eof,$time)

Will be called for any data after successful CONNECT or Upgrade. If no websocket functions are defined in the request object it will also be used for Websockets. $dir is 0 for data from client, 1 for data from server.

$request->in_wsctl($dir,$data,$time,$frameinfo)

This will be called after a Websocket upgrade when receiving a control frame. $dir is 0 for data from client, 1 for data from server. $data is the unmasked payload of the frame. $frameinfo is a blessed hash reference which contains the opcode of the frame, the mask (binary) and header for the frame header. For a close frame it will also contain the extracted status code and the reason.

To get the unmasked payload call $frameinfo->unmask($masked_data).

in_wsctl will be called on connection close with $data of '' and no \%frameinfo (i.e. no hash reference).

$request->in_wsdata($dir,$data,$eom,$time,$frameinfo)

This will be called after a Websocket upgrade when receiving data inside a data frame. Contrary to (the short) control frames the data frame must not be read fully before calling in_wsdata.

$dir is 0 for data from client, 1 for data from server. $data is the unmasked payload of the frame. $eom is true if the message is done with this call, that is if the data frame is done and the FIN bit was set on the frame. $frameinfo is a blessed hash reference which contains the data type as opcode. This will be the original opcode of the starting frame in case of fragmented transfer. It will also contain the mask (binary) of the current frame.

If this is the initial part of the data (i.e. initial frame in possibly fragmented data and initial data inside this frame) it will also have init set to true inside $frameinfo.

If there are still unread data within the frame $frameinfo will contain bytes_left as <[hi,low]> where hi and low are the upper and lower 32 bit parts of the number of outstanding bytes.

If this call to in_wsdata was caused by the start of a new frame and not further data in the same frame header will be set to the header of this new frame. In all other cases header is not set.

To get the unmasked payload call $frameinfo->unmask($masked_data).

$request->in_junk($dir,$data,$eof,$time)

Will be called for legally ignored junk (empty lines) in front of request or response body. $dir is 0 for data from client, 1 for data from server.

$request->fatal($reason,$dir,$time)

will be called on fatal errors, mostly protocol iregularities.

Methods suitable for overwriting:

new_request(\%meta)

default implementation will just call new_request from the attached flow

Helpful methods

$connection->dump_state

collects the state of the open connections. If defined wantarray it will return a message, otherwise output it via xdebug

$connection->offset(@dir)

returns the current offset(s) in the data stream, that is the position behind the within the in_* methods forwarded data.

$connection->gap_offset(@dir)

If the next bytes of the input stream are not needed to interpret the HTTP protocol (i.e. plain body data) this gives the offsets up to which data are "gapable". If no gaps are possible at the current state 0 will be returned. If everything can be gaps (usually because end of body is caused by end of connection) -1 will be returned.

$connection->gap_diff(@dir)

This is similar to gap_offset but will return the difference from the current position, i.e. how large the next gap can be. -1 again means an unlimited gap.

$connection->open_requests(@index)

in array context returns the objects for the open requests, in scalar context the number of open requests. If index is given only the specified objects will be returned, e.g. index -1 is the object currently receiving response data while index 0 specifies the object currently receiving request data (both are the same unless pipelining is used)

exportable utility functions and constants

METHODS_WITHOUT_RQBODY

This constant is an array reference of all request methods which will not have a request body, i.e. which have an implicit and non-changeble content-length of 0.

METHODS_WITH_RQBODY

This constant is an array reference of all request methods which must have a specified request body, even if the content-lenth is explicitly set to 0.

Methods which are not in METHODS_WITH_RQBODY or METHODS_WITHOUT_RQBODY might have a request body, that is if no content-length is explicitly given (or chunked transfer encoding is used) it is assumed that they don't have a body.

METHODS_WITHOUT_RPBODY

This constant is an array reference of all request methods which don't require a response body, i.e. which have an implicit and non-changeble content-length of 0.

CODE_WITHOUT_RPBODY

This constant is an array reference of all response codes which will not have a response body, i.e. which have an implicit and non-changeble content-length of 0.

parse_hdrfields($header,\%fields) -> $bad_lines

This function parses the given message header (without request or status line!) and extracts the key:value pairs into %fields. Each key in %fields is the lower-case representation of the key from the HTTP message and the value in %fields is a list with all values, i.e. a list with a single element if the specific key was only used once the header, but with multiple elements if the key was used multiple times. Any continuation lines will be transformed into a single line.

It will return any remaining data in $header which could not be interpreted as proper key:value pairs. If the message contains no errors it will thus return ''.

parse_reqhdr($string,\%header,[$external_length]) -> $bad_header

This will parse the given $string as a request header and extract information into \%header. These information then later will be given to in_request_header. See there for more details about the contents of the hash.

If $external_length is true it will not complain if a content-length is required but not defined.

parse_rsphdr($string,\%request,\%header) -> $bad_header

This will parse the given $string as a response header and extract information into \%header. These information then later will be given to in_request_header. See there for more details about the contents of the hash.

%request contains information about the request. One might simple use the hash filled by parse_reqhdr here. If not at least the information about method, expect and upgrade must be provided because they are needed to interpret the response correctly.

LIMITS

100 Continue, 101 Upgrade are not yet implemented.