NAME
SMB::Parser - Convenient data parser for network protocols like SMB
SYNOPSIS
use SMB::Parser;
# Parse an imaginative packet of the following structure:
# protocol signature (2 bytes in big-endian), header (48),
# secret key (8), flags (1), mode (2 in little-endian),
# payload offset (4) and length (4),
# filename prefixed with length (2 + length),
# padding to 4 bytes,
# payload
# SMB::Packer documentation shows how it could be packed.
my $parser = SMB::Parser->new($packet_data_buffer);
die if $parser->uint16_be != 0xFACE; # check signature
$parser->skip(48); # skip header (48 bytes)
my $body_start = $parser->offset; # store offset (50 here)
my $secret = $parser->bytes(8);
my $flags = $parser->uint8;
my $mode = $parser->uint16;
my $payload_offset = $parser->uint32;
my $payload_length = $parser->uint32;
my $text_length = $parser->uint16;
my $filename = $parser->utf16($text_length);
$parser->align; # redundant; mere jump using reset is enough
$parser->reset($body_start + $payload_offset);
my $payload = $parser->bytes($payload_length);
$parser->align;
my $unconsumed_buffer = $parser->bytes(
bytes::length($packet_data_buffer) - $parser->offset);
DESCRIPTION
This class allows to parse a binary data, like a network packet data.
It supports extracting blobs, unsigned integers of different lengths, text in arbitrary encoding (SMB uses UTF-16LE) and more.
The current data pointer is usually between 0 and the data size. The managed data once set is never changed, so the data pointer may go over the data size if the caller is not cautious. This is different from SMB::Packer where the data is automatically extended in this case.
This class inherits from SMB, so msg, err, mem, dump, auto-created field accessor and other methods are available as well.
METHODS
- new DATA
-
Class constructor. Returns an SMB::Parser instance and initializes its data with DATA and its pointer with 0 using set.
- reset [OFFSET=0]
-
Resets the current data pointer.
Specifying OFFSET over the managed data size does not produce an error, but may likely cause all consequent parsing calls to return empty/null values with possible warnings, although the pointer consistently continues to advance.
- set DATA [OFFSET=0]
-
Sets the object DATA (binary scalar) to be parsed and resets the pointer using reset.
- cut DATA [OFFSET=<current-offset>]
-
Cuts data until the given OFFSET (by default until the current offset). This is useful to strip all processed data and have offset at 0.
If OFFSET is lesser than the current offset, then the current offset is adjusted correspondingly (reduced by OFFSET). If it is greater, then the data is still cut as requested and the current offset is reset to 0.
- data
-
Returns the managed data (binary scalar).
- size
-
Returns the managed data size.
- offset
-
Returns the current data pointer (starts from 0).
- align [START_OFFSET=0] [STEP=4]
-
Advances the pointer, if needed, until the next alignment point (that is every STEP bytes starting from START_OFFSET).
- skip N_BYTES
-
Advances the pointer in N_BYTES (non-negative integer).
Returns the object, to allow chaining a consequent parsing method.
- bytes N_BYTES
-
Normally returns the binary scalar of length N_BYTES starting from the current data pointer and advances the pointer.
On data overflow, less bytes than N_BYTES returned (and on consequent calls, 0 bytes returns). The data pointer is guaranteed to be advanced in N_BYTES, even on/after the overflow.
The following parsing methods use this method internally, so they share the same logic about reaching the end-of-data and advancing the pointer.
- str N_BYTES [ENCODING='UTF-16LE']
-
Decodes N_BYTES (non-negative integer) as the text in the requested encoding starting from the current data pointer.
The returned string has the utf8 flag set if it is non-ASCII.
- utf16 N_BYTES
-
The same as str with encoding 'UTF-16LE'.
- utf16_be N_BYTES
-
The same as str with encoding 'UTF-16BE'.
- uint8
- uint16
- uint32
- uint16_be
- uint32_be
- uint64
-
Unpacks an unsigned integer of the specified length in bits (i.e. 1, 2, 4, 8 bytes).
By default, the byte order is little-endian (since it is used in SMB). The method suffix "_be" denotes the big-endian byte order for parsing.
- fid1
-
Parses a file id used in SMB 1.
Returns an unsigned integer of 2 bytes.
- fid2
-
Parses a file id used in SMB 2.
Returns an array ref of two unsigned integers of 8 bytes each.
SEE ALSO
AUTHOR
Mikhael Goikhman <migo@cpan.org>