NAME
Data::ParseBinary - Yet Another parser for binary structures
SYNOPSIS
$s = Struct("foo",
UBInt8("a"),
UBInt16("b"),
Struct("bar",
UBInt8("a"),
UBInt16("b"),
)
);
$data = $s->parse("ABBabb");
# $data contains { a => 65, b => 16962, bar => { a == 97, b => 25186 } }
DESCRIPTION
This module is a Perl Port for PyConstructs http://construct.wikispaces.com/
Please note that this is a first experimental release. While the external interface is more or less what I want it to be, the internals are still in flux.
This module enables writing declarations for simple and complex binary structures, parsing binary to hash/array data structure, and building binary data from hash/array data structure.
Reference Code
Primitives
First off, a list of primitive elements:
UBInt8
ULInt8
UNInt8
SBInt8
SNInt8
SLInt8
Byte
UBInt16
ULInt16
UNInt16
SBInt16
SLInt16
SNInt16
UBInt32
ULInt32
UNInt32
SNInt32
SBInt32
SLInt32
NFloat32
BFloat32
LFloat32
UNInt64
UBInt64
ULInt64
SNInt64
SBInt64
SLInt64
BFloat64
LFloat64
NFloat64
S - Signed, U - Unsigned N - Platform natural, L - Little endian, B - Big Endian Samples:
UBInt16("foo")->parse("\x01\x02") == 258
ULInt16("foo")->parse("\x01\x02") == 513
UBInt16("foo")->build(31337) eq 'zi'
SBInt16("foo")->build(-31337) eq "\x85\x97"
SLInt16("foo")->build(-31337) eq "\x97\x85"
Structs and Sequences
$s = Struct("foo",
UBInt8("a"),
UBInt16("b"),
Struct("bar",
UBInt8("a"),
UBInt16("b"),
)
);
$data = $s->parse("ABBabb");
# $data is { a => 65, b => 16962, bar => { a => 97, b => 25186 } }
$s = Sequence("foo",
UBInt8("a"),
UBInt16("b"),
Sequence("bar",
UBInt8("a"),
UBInt16("b"),
)
);
$data = $s->parse("ABBabb");
# $data is [ 65, 16962, [ 97, 25186 ] ]
Arrays and Ranges
# This is an Array of four bytes
$s = Array(4, UBInt8("foo"));
$data = $s->parse("\x01\x02\x03\x04");
# $data is [1, 2, 3, 4]
# This is an array for 3 to 7 bytes
$s = Range(3, 7, UBInt8("foo"));
$data = $s->parse("\x01\x02\x03");
$data = $s->parse("\x01\x02\x03\x04\x05\x06\x07\x08\x09");
# in the last example, will take only 7 bytes from the stream
# A range with at least one byte, unlimited
$s = GreedyRange(UBInt8("foo"));
# A range with zero to unlimited bytes
$s = OptionalGreedyRange(UBInt8("foo"));
Padding and BitStructs
Padding remove bytes from the stream
$s = Struct("foo",
Padding(2),
Flag("myflag"),
Padding(5),
);
$data = $s->parse("\x00\x00\x01\x00\x00\x00\x00\x00");
# $data is { myflag => 1 }
Any bit field, when inserted inside a regular struct, will read one byte and use only a few bits from the byte. for working with bits, BitStruct can be used.
$s = BitStruct("foo",
Padding(2),
Flag("myflag"),
Padding(5),
);
$data = $s->parse("\x20");
# $data is { myflag => 1 }
Padding in BitStruct remove bits from the stream, not bytes.
$s = BitStruct("foo",
BitField("a", 3), # three bit int
Flag("b"), # one bit
Padding(3), # three bit padding
Nibble("c"), # four bit int
BitField("d", 5), # five bit int
);
$data = $s->parse("\xe1\x1f");
# $data is { a => 7, b => 0, c => 8, d => 31 }
there is also Octet that is eight bit int.
BitStruct must not be inside other BitStruct. use Struct for it.
$s = BitStruct("foo",
BitField("a", 3),
Flag("b"),
Padding(3),
Nibble("c"),
Struct("bar",
Nibble("d"),
Bit("e"),
)
);
$data = $s->parse("\xe1\x1f");
# $data is { a => 7, b => 0, c => 8, bar => { d => 15, e => 1 } }
Adapters And Validators
Adapters are constructs that transform the data that they work on. For creating an adapter, the class should inherent from the Data::ParseBinary::Adapter class. For example:
package IpAddressAdapter;
our @ISA = qw{Data::ParseBinary::Adapter};
sub _encode {
my ($self, $tvalue) = @_;
return pack "C4", split '\.', $tvalue;
}
sub _decode {
my ($self, $value) = @_;
return join '.', unpack "C4", $value;
}
This adapter transforms dotted IP address ("1.2.3.4") for four bytes binary. However, adapter need a underline data constructs. so for actually creating one we should write:
my $ipAdapter = IpAddressAdapter->create(Bytes("foo", 4));
(an adapter inherent its name from the underline data construct) Or we can create an little function:
sub IpAddressAdapterFunc {
my $name = shift;
IpAddressAdapter->create(Bytes($name, 4));
}
And then:
IpAddressAdapterFunc("foo")->parse("\x01\x02\x03\x04");
On additional note, it is possible to declare an "init" sub inside IpAddressAdapter, that will receive any extra parameter that "create" recieved.
One of the built-in Adapters is Enum:
$s = Enum(Byte("protocol"),
TCP => 6,
UDP => 17,
);
$s->parse("\x06") # return 'TCP'
$s->parse("\x11") # return 'UDP'
$s->build("TCP") # returns "\x06"
It is also possible to have a default:
$s = Enum(Byte("protocol"),
TCP => 6,
UDP => 17,
_default_ => "blah",
);
$s->parse("\x12") # returns 'blah'
And finally:
$s = Enum(Byte("protocol"),
TCP => 6,
UDP => 17,
_default_ => $DefaultPass,
);
$s->parse("\x12") # returns 18
$DefaultPass tells Enum that if it isn't familiar with the value, pass it alone.
We also have Validators. A Validator is an Adapter that instead of transforming data, validate it. Examples:
OneOf(UBInt8("foo"), [4,5,6,7])->parse("\x05") # return 5
OneOf(UBInt8("foo"), [4,5,6,7])->parse("\x08") # dies.
NoneOf(UBInt8("foo"), [4,5,6,7])->parse("\x08") # returns 8
NoneOf(UBInt8("foo"), [4,5,6,7])->parse("\x05") # dies
Meta-Constructs
Life isn't always simple. If you only have a rigid structure with constance types, then you can use other modules, that are far simplier. hack, use pack/unpack.
So if you have more complicate requirements, welcome to the meta-constructs. The first on is the field. a Field is a chunk of bytes, with variable length:
$s = Struct("foo",
Byte("length"),
Field("data", sub { $_->ctx->{length} }),
);
(it can be also in constent length, by replacing the code section with, for example, 4) So we have struct, that the first byte is the length of the field, and after that the field itself. An example:
$data = $s->parse("\x03ABC");
# $data is {length => 3, data => "ABC"}
$data = $s->parse("\x04ABCD");
# $data is {length => 4, data => "ABCD"}
And so on.
In the meta-constructs, $_ is loaded with all the data that you need. $_->ctx is equal to $_->ctx(0), that returns hash-ref containing all the data that the current struct parsed. In this example, it contain only "length". Is you want to go another level up, just request $_->ctx(1).
Another meta-construct is the Array:
$s = Struct("foo",
Byte("length"),
Array(sub { $_->ctx->{length}}, UBInt16("data")),
);
$data = $s->parse("\x03\x00\x01\x00\x02\x00\x03");
# $data is {length => 3, data => [1, 2, 3]}
RepeatUntil gets for every round to inspect data on $_->obj:
$s = RepeatUntil(sub {$_->obj eq "\x00"}, Field("data", 1));
$data = $s->parse("abcdef\x00this is another string");
# $data is [qw{a b c d e f}, "\0"]
OK. enough with the games. let's see some real branching.
$s = Struct("foo",
Enum(Byte("type"),
INT1 => 1,
INT2 => 2,
INT4 => 3,
STRING => 4,
),
Switch("data", sub { $_->ctx->{type} },
{
"INT1" => UBInt8("spam"),
"INT2" => UBInt16("spam"),
"INT4" => UBInt32("spam"),
"STRING" => String("spam", 6),
}
)
);
$data = $s->parse("\x01\x12");
# $data is {type => "INT1", data => 18}
$data = $s->parse("\x02\x12\x34");
# $data is {type => "INT2", data => 4660}
$data = $s->parse("\x04abcdef");
# $data is {type => "STRING", data => 'abcdef'}
And so on. Switch also have a default option:
$s = Struct("foo",
Byte("type"),
Switch("data", sub { $_->ctx->{type} },
{
1 => UBInt8("spam"),
2 => UBInt16("spam"),
},
default => UBInt8("spam")
)
);
And can use $DefaultPass that make it to no-op.
$s = Struct("foo",
Byte("type"),
Switch("data", sub { $_->ctx->{type} },
{
1 => UBInt8("spam"),
2 => UBInt16("spam"),
},
default => $DefaultPass,
)
);
Pointers are another animal of meta-struct. For example:
$s = Struct("foo",
Pointer(sub { 4 }, Byte("data1")), # <-- data1 is at (absolute) position 4
Pointer(sub { 7 }, Byte("data2")), # <-- data2 is at (absolute) position 7
);
$data = $s->parse("\x00\x00\x00\x00\x01\x00\x00\x02");
# $data is {data1=> 1 data2=>2 }
Literaly is says: jump to position 4, read byte, return to the beginning, jump to position 7, read byte, return to the beginning.
Anchor can help a Pointer to find it's target:
$s = Struct("foo",
Byte("padding_length"),
Padding(sub { $_->ctx->{padding_length} } ),
Byte("relative_offset"),
Anchor("absolute_position"),
Pointer(sub { $_->ctx->{absolute_position} + $_->ctx->{relative_offset} }, Byte("data")),
);
$data = $s->parse("\x05\x00\x00\x00\x00\x00\x03\x00\x00\x00\xff");
# $data is { absolute_position=> 7, relative_offset => 3, data => 255, padding_length => 5 }
Anchor saves the current location in the stream, enable the Pointer to jump to location relative to it.
Strings
A string with constant length:
String("foo", 5)->parse("hello")
# returns "hello"
A Padded string with constant length:
$s = String("foo", 10, padchar => "X", paddir => "right");
$s->parse("helloXXXXX") # return "hello"
$s->build("hello") # return 'helloXXXXX'
I think hat it speaks for itself. only that paddir can be noe of qw{right left center}, and there can be also trimdir that can be "right" or "left".
PascalString - String with a length marker in the beginning:
$s = PascalString("foo");
$s->build("hello world") # returns "\x0bhello world"
The marker can be of any kind:
$s = PascalString("foo", 'UBInt16');
$s->build("hello") # returns "\x00\x05hello"
And finally, CString:
$s = CString("foo");
$s->parse("hello\x00") # returns 'hello'
Can have many optional terminators:
$s = CString("foo", terminators => "XYZ");
$s->parse("helloY") # returns 'hello'
Various
Some verious constructs.
$s = Struct("foo",
UBInt8("width"),
UBInt8("height"),
Value("total_pixels", sub { $_->ctx->{width} * $_->ctx->{height}}),
);
A calculated value - not in the stream.
$s = Struct("foo",
Flag("has_options"),
If(sub { $_->ctx->{has_options} },
Bytes("options", 5)
)
);
$s = Struct("foo",
Byte("a"),
Peek(Byte("b")),
Byte("c"),
);
Peek is like Pointer for the current location. read the data, and then return to the location before the data.
$s = Const(Bytes("magic", 6), "FOOBAR");
Const verify that a certain value exists
Terminator()->parse("")
verify that we reached the end of the stream
$s = Struct("foo",
Byte("a"),
Alias("b", "a"),
);
Copies "a" to "b".
$s = Union("foo",
UBInt32("a"),
UBInt16("b")
);
$data = $s->parse("\xaa\xbb\xcc\xdd");
# $data is { a => 2864434397, b => 43707 }
A Union. currently work only with primitives, and not on bit-stream.
LasyBound
This construct is estinental for recoursive constructs.
$s = Struct("foo",
Flag("has_next"),
If(sub { $_->ctx->{has_next} }, LazyBound("next", sub { $s })),
);
$data = $s->parse("\x01\x01\x01\x00");
# $data is:
# {
# has_next => 1,
# next => {
# has_next => 1,
# next => {
# has_next => 1,
# next => {
# has_next => 0,
# next => undef
# }
# }
# }
# }
Streams
Until now, everything worked in single-action. build built one construct, and parse parsed one construct from one string. But suppose the string have more then one construct in it? Suppose we want to write two constructs into one string? (and if these constructs are in bit-mode, we can't create and just join them)
So, anyway, we have streams. A stream is an object that let a construct read and parse bytes from, or build and write bytes to.
Please note, that some constructs can only work on seekable streams.
String
is seekable, not bit-stream
This is the most basic stream.
$data = $s->parse("aabb");
# is equivalent to:
$stream = CreateStreamReader("aabb");
$data = $s->parse($stream);
# also equivalent to:
$stream = CreateStreamReader(String => "aabb");
$data = $s->parse($stream);
Being that String is the default stream type, it is not needed to specify it. So, if there is a string contains two or more structs, that the following code is possible:
$stream = CreateStreamReader(String => $my_string);
$data1 = $s1->parse($stream);
$data2 = $s2->parse($stream);
The other way is equally possible:
$stream = CreateStreamWriter(String => undef);
$s1->build($data1);
$s2->build($data2);
$my_string = $stream->Flush();
The Flush command in Writer Stream says: finish doing whatever you do, and return your internal object. For string writer it is simply return the string that it built. other stream may do more things. (for example, Bit stream, close the last byte, output it to the internal stream, and returns that internal stream.)
In creation, the following lines are equvalent:
$stream = CreateStreamWriter(undef);
$stream = CreateStreamWriter('');
$stream = CreateStreamWriter(String => undef);
$stream = CreateStreamWriter(String => '');
Of course, it is possible to create String Stream with inital string to append to:
$stream = CreateStreamWriter(String => "aabb");
And any sequencal build operation will append to the "aabb" string.
StringRef
is seekable, not bit-stream
Mainly for cases when the string is to big to play around with. Writer:
my $string = '';
$stream = CreateStreamWriter(StringRef => \$string);
... do build operations ...
# and now the data in $string.
# or refer to: ${ $stream->Flush() }
Because Flush returns what's inside the stream - in this case a reference to a string. For Reader:
my $string = 'MBs of data...';
$stream = CreateStreamReader(StringRef => \$string);
... parse operations ...
Bit
not seekable, is bit-stream
While every stream support bit-fields, when requesting 2 bits in non-bit-streams you get these two bits, but a whole byte is consumed from the stream. In bit stream, only two bits are consumed.
When you use BitStruct construct, it actually warps the current stream wit a bit stream. If you try to put BitStruct inside BitStruct, it will fail because warping bit stream inside other bit stream isn't logical.
What does it all have to do with you? great question. Support you have a string containing a few bit structs, and each struct is aligned to a byte border. Then you can use the example under the BitStruct section.
However, if the bit structs are not aligned, but compressed one against the other, then you should use:
$s = Struct("foo",
Padding(1),
Flag("myflag"),
Padding(3),
);
$inner = "\x42\0";
$stream1 = CreateStreamReader(Bit => String => $inner);
$data1 = $s->parse($stream1);
# data1 is { myflag => 1 }
$data2 = $s->parse($stream1);
# data2 is { myflag => 1 }
$data3 = $s->parse($stream1);
# data3 is { myflag => 0 }
Note that the Padding constructs detects that it work on bit stream, and pad in bits instead of bytes.
StringBuffer
is seekable, not bit-stream
Suppose that you have some non-seekable stream. like socket. and suppose that your struct do use construct that need seekable stream. What can you do?
Enter StringBuffer. It reads from the warped stream exactly the number of bytes that the struct needs, giving the struct the option to seek inside the read section. and if the struct seeks ahead - it will just read enough bytes to seek to this place.
In writer stream, the StringBuffer will pospone writing the data to the actual stream, until the Flush command.
This warper stream is usefull only when the struct seek inside it's borders, and not sporadically reads data from 30 bytes ahead / back.
# suppose we have unseekable reader stream names $s_stream
# (for example, TCP connection)
$stream1 = CreateStreamReader(StringBuffer => $s_stream);
# $s is some struct that uses seek. (using Peek, for example)
$data = $s->parse($stream1);
# the data were read, you can either drop $stream1 or continue use
# it for future parses.
# now suppose we have a unseekable writer strea name $w_stream
$stream1 = CreateStreamWriter(StringBuffer => $w_stream);
# $s is some struct that uses seek. (using Peek, for example)
$s->build($data1, $stream1); # data is written into $stream1
$stream1->Flush(); # data is written to $w_stream
$w_stream->Flush(); # data is sent.
# second option for the writer stream:
$s->build($data1, CreateStreamWriter(StringBuffer => $w_stream))->Flush();
$w_stream->Flush(); # data is sent.
TODO
The following elements were not implemented:
OnDemand
Optional
Reconfig and a macro Rename
Aligned and AlignedStruct
Probe
Embed
Tunnel
Add encodings support for the Strings
Convert the original unit tests to Perl (and make them pass...)
A lot of fiddling with the internal
Streams: FileStream, SocketStream
Use StringBuffer in Union, eliminating the need for seekable stream
Ability to give the CreateStreamReader/CreateStreamWriter function an ability to reconginze socket / filehandle / pointer to string.
The documentation is just in its beginning
Union handle only primitives. need to be extended to other constructs, and bit-structs.
Padding/Stream/bitstream duality - need work
add the stream object to the parser object? can be usefull with Pointer.
use some nice exception system
Thread Safety
This is a pure perl module. there should be not problems.
BUGS
None known
This is a first release - your feedback will be appreciated.
SEE ALSO
Original PyConstructs homepage: http://construct.wikispaces.com/
AUTHOR
Fomberg Shmuel, <owner@semuel.co.il>
COPYRIGHT AND LICENSE
Copyright 2008 by Shmuel Fomberg.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.