NAME
Net::Inspect::L7::HTTP - guesses and handles HTTP traffic
SYNOPSIS
my $req = Net::Inspect::L7::HTTP::Request::Simple->new(..);
my $http = Net::Inspect::L7::HTTP->new($req);
my $guess = Net::Inspect::L5::GuessProtocol->new;
$guess->attach($http);
...
DESCRIPTION
This class extracts HTTP requests from TCP connections. It provides all hooks required for Net::Inspect::L4::TCP
and is usually used together with it. It provides the guess_protocol
hook so it can be used with Net::Inspect::L5::GuessProtocol
.
Attached flow is usually a Net::Inspect::L7::HTTP::Request::*
object.
Hooks provided:
- guess_protocol($guess,$dir,$data,$eof,$time,$meta)
- new_connection($meta,%args)
-
This returns an object for the connection. With
$args{header_maxsize}
the maximum size of the message headers can be given, that is:$args{header_maxsize}[0] - request header, default 64k $args{header_maxsize}[1] - response header, default 16k $args{header_maxsize}[2] - chunked header, default 2k
- $connection->in($dir,$data,$eof,$time)
-
Processes new data and returns number of bytes processed. Any data not processed must be sent again with the next call.
$data
are the data as string. In some cases $data can be[ 'gap' => $len ]
, e.g. only the information, that there would be$len
bytes of data w/o submitting the data. These should only be submitted in request and response bodies and only if the attached layer can handle these gaps in thein_request_body
andin_response_body
methods.Gaps on other places are not allowed, because all other data are needed for interpreting the placement of request, response and data inside the connection.
- $connection->fatal($reason,$dir,$time)
Hooks called:
- new_request(\%meta,$conn)
-
This should return an request object. The reference to the connection object is given in case the request object likes to call
fatal
to end the connection.The function should not get hold of $conn, e.g. only store a weak reference, otherwise memory might leak.
- $request->in_request_header($header,$time,\%hdr_meta)
-
Called when the full request header is read. $header is the string of the header.
%hdr_meta contains information extracted from the header:
- method - method of request
- url - url, as given in request
- version - version of HTTP spoken in request
- info - first line of request (method url version)
- fields - (key => \@values) hash of header fields
- junk - invalid data found in header fields part
- content_length - length of request body
- chunked - true if body uses transfer encoding chunked
- upgrade - contains hash when protocol upgrade was requested
-
Currently this hash contains the key
websocket
with the value of thesec-websocket-key
if a valid request for a Websocket upgrade was detected. - expect - contains hash for expectations from Expect header
-
Currently the only possible key is
100-continue
.
- $request->in_response_header($header,$time,\%hdr_meta)
-
Called when the full response header is read. $header is the string of the header.
%hdr_meta contains information extracted from the header:
If a keep-alive connection was closed by the server after the request was transmitted but after the response header was sent this function is called once with an empty header to signal this (normal) condition.
- version - version of HTTP spoken in response
- code - status code from response
- reason - reason given for response code
- fields - (key => \@values) hash of header fields
- junk - invalid data found in header fields part
- content_length - length of request body if known, else undef
- chunked - true if body uses transfer encoding chunked
- upgrade - new protocol when switching protocols, e.g. 'websocket'
- preliminary - true if this is a preliminary response
- $request->in_request_body($data,$eobody,$time)
-
Called for a chunk of data of the request body. $eobody is true if this is the last chunk of the request body. If the request body is empty the method will be called once with
''
. This function will not be called for CONNECT requests because these are special. It will also not called be on Upgrade requests if the upgrade succeeds. Since this is only known once we got the response the call will be deferred in this case.$data can be
[ 'gap' => $len ]
if the input to this layer were gaps. - $request->in_response_body($data,$eobody,$time)
-
Called for a chunk of data of the response body. $eof is true if this is the last chunk of the connection. $eobody is true if this is the last chunk of the response body. If the response body is empty the method will be called once with
''
.This function will not be called for CONNECT requests or for successful protocol upgrades caused by a valid reply of
101 Switching Protocols
. In this case insteadin_data
or a special protocol callback will be used to handle any future data.$data can be
[ 'gap' => $len ]
if the input to this layer were gaps. - $request->in_chunk_header($dir,$header,$time)
-
will be called with the chunk header for chunked encoding. Usually one is not interested in the chunk framing, only in the content so that this method will be empty. Will be called before the chunk data.
- $request->in_chunk_trailer($dir,$trailer,$time)
-
will be called with the chunk trailer for chunked encoding. Usually one is not interested in the chunk framing, only in the content so that this method will be empty. Will be called after in_response_body/in_request_body got called with eof true.
- $request->in_data($dir,$data,$eof,$time)
-
Will be called for any data after successful upgrade with a CONNECT request, unless the request object specifically handled these upgrades by implementing a
upgrade_CONNECT
orupgrade_ANY
method.$dir
is 0 for data from client, 1 for data from server. - $request->in_junk($dir,$data,$eof,$time)
-
Will be called for legally ignored junk (empty lines) in front of request or response body.
$dir
is 0 for data from client, 1 for data from server. - $request->fatal($reason,$dir,$time)
-
will be called on fatal errors, mostly protocol iregularities.
Methods suitable for overwriting:
Helpful methods
- $connection->dump_state
-
collects the state of the open connections. If defined wantarray it will return a message, otherwise output it via xdebug
- $connection->offset(@dir)
-
Returns the current offset(s) in the data stream, that is the position behind the data when calling the in_* methods.
- $connection->gap_offset(@dir)
-
If the next bytes of the input stream are not needed to interpret the HTTP protocol (i.e. plain body data) this gives the offsets up to which data are "gapable". If no gaps are possible at the current state
0
will be returned. If everything can be gaps (usually because end of body is caused by end of connection)-1
will be returned. - $connection->gap_diff(@dir)
-
This is similar to
gap_offset
but will return the difference from the current position, i.e. how large the next gap can be.-1
again means an unlimited gap. - $connection->set_gap_diff($dir,$diff)
-
This function is used internally by upgrade handlers to change the idea how large the next gap can be. If
$diff
is defined it is assumed that the next$diff
bytes could be skipped without loosing information necessary to keep maintain the proper connection state. If$diff
is not defined it will be assumed that no data can be skipped. - $connection->open_requests(@index)
-
in array context returns the objects for the open requests, in scalar context the number of open requests. If index is given only the specified objects will be returned, e.g. index -1 is the object currently receiving response data while index 0 specifies the object currently receiving request data (both are the same unless pipelining is used)
Protocol Upgrades
Protocol upgrades are usually done by the server responding with a status code of 101 Switching Protocols
to a request containing a Upgrade
header. A different kind of upgrade is done with the CONNECT request.
These kind of upgrades are handled in a generic way by calling the connect_$method
method of the request object. $method
is the lower case name of the new protocol as set in the Upgrade
header of the response. For upgrades done with the CONNECT
method "CONNECT" will be used instead. The upgrade handler is called as $request_object->upgrade_$method($self,$request,$response)
. See Net::Inspect::L7::HTTP::WebSocket for a more detailed description of this function and its arguments.
If there is no method specific function it will try upgrade_ANY
. This call is similar to upgrade_$method
but with an additional argument for the method, e.g. $request_object->upgrade_$method($self,$request,$response,$method)
. If this function is also not defined it will use a built-in handler calling in_data
for the CONNECT method. For any other methods the connection will be considered bad because it does not know how to handle the remaining data.
exportable utility functions and constants
- METHODS_WITHOUT_RQBODY
-
This constant is an array reference of all request methods which will not have a request body, i.e. which have an implicit and non-changeble content-length of 0.
- METHODS_WITH_RQBODY
-
This constant is an array reference of all request methods which must have a specified request body, even if the content-lenth is explicitly set to 0.
Methods which are not in METHODS_WITH_RQBODY or METHODS_WITHOUT_RQBODY might have a request body, that is if no content-length is explicitly given (or chunked transfer encoding is used) it is assumed that they don't have a body.
- METHODS_WITHOUT_RPBODY
-
This constant is an array reference of all request methods which don't require a response body, i.e. which have an implicit and non-changeble content-length of 0.
- CODE_WITHOUT_RPBODY
-
This constant is an array reference of all response codes which will not have a response body, i.e. which have an implicit and non-changeble content-length of 0.
- parse_hdrfields($header,\%fields) -> $bad_lines
-
This function parses the given message header (without request or status line!) and extracts the
key:value
pairs into%fields
. Each key in%fields
is the lower-case representation of the key from the HTTP message and the value in%fields
is a list with all values, i.e. a list with a single element if the specific key was only used once the header, but with multiple elements if the key was used multiple times. Any continuation lines will be transformed into a single line.It will return any remaining data in
$header
which could not be interpreted as properkey:value
pairs. If the message contains no errors it will thus return''
. - parse_reqhdr($string,\%header,[$external_length]) -> $bad_header
-
This will parse the given
$string
as a request header and extract information into \%header. These information then later will be given toin_request_header
. See there for more details about the contents of the hash.If
$external_length
is true it will not complain if a content-length is required but not defined. - parse_rsphdr($string,\%request,\%header,\@warn) -> $bad_header
-
This will parse the given
$string
as a response header and extract information into \%header. These information then later will be given toin_response_header
. See there for more details about the contents of the hash.%request
contains information about the request. One might simple use the hash filled byparse_reqhdr
here. If not at least the information aboutmethod
,expect
andupgrade
must be provided because they are needed to interpret the response correctly.