NAME
Bifcode2 - encode and decode Bifcode2 serialization format
VERSION
2.0.0_12 (2022-01-17)
SYNOPSIS
use utf8;
use boolean;
use Bifcode2 qw( encode_bifcode2 decode_bifcode2 );
my $bifcode = encode_bifcode2 {
bools => [ boolean::false, boolean::true, ],
bytes => \pack( 's<', 255 ),
integer => 25,
real => 1.25e-5,
null => undef,
utf8 => "ÃÂûÃÂÃÂ÷",
}
# 7b 75 35 2e 62 6f 6f 6c 73 3a 5b 66 2c 74 2c {u5.bools:[f,t,
# 5d 75 35 2e 62 79 74 65 73 3a 62 32 2e ff 0 ]u5.bytes:b2...
# 2c 75 37 2e 69 6e 74 65 67 65 72 3a 69 32 35 ,u7.integer:i25
# 2c 75 34 2e 6e 75 6c 6c 3a 7e 2c 75 34 2e 72 ,u4.null:~,u4.r
# 65 61 6c 3a 72 31 2e 32 35 65 2d 35 2c 75 34 eal:r1.25e-5,u4
# 2e 75 74 66 38 3a 75 31 30 2e ce 95 ce bb cf .utf8:u10......
# 8d cf 84 ce b7 2c 7d .....,}
my $decoded = decode_bifcode2 $bifcode;
DESCRIPTION
Bifcode2 implements the Bifcode2 serialisation format, a mixed binary/text encoding with support for the following data types:
Primitive:
Undefined(null)
Booleans(true/false)
Integer numbers
Real numbers
UTF8 strings
Binary strings
Structured:
Arrays(lists)
Hashes(dictionaries)
The encoding is simple to construct and relatively easy to parse. There is no need to escape special characters in strings. It is not considered human readable, but as it is mostly text it can usually be visually debugged.
+---------+--------------------+---------------------+
| Type | Perl | Bifcode |
+---------+--------------------+---------------------+
| UNDEF | undef | ~, |
| TRUE | boolean::true | t, |
| FALSE | boolean::false | f, |
| INTEGER | -1 | i-1, |
| INTEGER | 0 | i0, |
| INTEGER | 1 | i1, |
| REAL | 3.1415 | r3.1415e0, |
| BYTES | \pack( 's<', 255 ) | b2.�, |
| UTF8 | 'MIXãD ìãXì' | u14.MIXãD ìãXì, |
| ARRAY | [ 'one', 'two' ] | [u3.one,u3.two,] |
| DICT | { key => 'value'} | {u3.key:u5.value,} |
+---------+--------------------+---------------------+
Bifcode2 can only be constructed canonically; i.e. there is only one possible encoding per data structure. This property makes it suitable for comparing structures (using cryptographic hashes) across networks.
In terms of size the encoding is similar to minified JSON. In terms of speed this module compares well with other pure Perl encoding modules with the same features.
MOTIVATION
Bifcode was created for a project because none of currently available serialization formats (Bencode, JSON, MsgPack, Netstrings, Sereal, YAML, etc) met the requirements of:
Support for undef
Support for binary data
Support for UTF8 strings
Universally-recognized canonical form for hashing
Trivial to construct on the fly from SQLite triggers
I have no lofty goals or intentions to promote this outside of my specific case, but would appreciate hearing about other uses or implementations.
SPECIFICATION
The encoding is defined as follows:
BIFCODE_UNDEF
A null or undefined value correspond to "~,".
BIFCODE_TRUE and BIFCODE_FALSE
Boolean values are represented by "t," and "f,".
BIFCODE_UTF8
A UTF8 string is "u" followed by the octet length of the encoded string as a base ten number followed by a "." and the encoded string followed by ",". For example the Perl string "\x{df}" (ÃÂÃÂÃÂÃÂ) corresponds to "u2.\x{c3}\x{9f},".
BIFCODE_BYTES
Opaque data is 'b' followed by the octet length of the data as a base ten number followed by a "." and then the data itself followed by ",". For example a three-byte blob 'xyz' corresponds to 'b3.xyz,'.
BIFCODE_INTEGER
Integers are represented by an 'i' followed by the number in base 10 followed by a ','. For example 'i3,' corresponds to 3 and 'i-3,' corresponds to -3. Integers have no size limitation. 'i-0,' is invalid. All encodings with a leading zero, such as 'i03,', are invalid, other than 'i0,', which of course corresponds to 0.
BIFCODE_REAL
Real numbers are represented by an 'r' followed by a decimal number in base 10 followed by a 'e' followed by an exponent followed by a ','. For example 'r3.0e-1,' corresponds to 0.3 and 'r-0.1e0,' corresponds to -0.1. Reals have no size limitation. 'r-0.0e0,' is invalid. All encodings with an extraneous leading zero, such as 'r03.0e0,', or an extraneous trailing zero, such as 'r3.10e0,', are invalid.
BIFCODE_LIST
Lists are encoded as a '[' followed by their elements (also Bifcode2 encoded) followed by a ']'. For example '[u4.spam,u4.eggs,]' corresponds to ['spam', 'eggs'].
BIFCODE_DICT
Dictionaries are encoded as a '{' followed by a list of alternating keys and their corresponding values followed by a '}'. Keys must be of type BIFCODE_UTF8 or BIFCODE_BYTES and are encoded with a ":" as the last character instead of ",".
For example, '{u3.cow:u3.moo,u4.spam:u4.eggs,}' corresponds to {'cow': 'moo', 'spam': 'eggs'} and '{u4.spam:[u1.a,u1.b,]}' corresponds to {'spam'. ['a', 'b']}. Keys must appear in sorted order (sorted as raw strings, not alphanumerics).
BIFCODE_BIFCODE
A Bifcode string is "B" followed by the octet length of the encoded string as a base ten number followed by a "." and the encoded string followed by ",". This is typically used to frame Bifcode structures over a network.
INTERFACE
encode_bifcode2( $datastructure [, $enclose ] )
The first argument (required) may be a scalar, or may be a reference to either a scalar, an array, or a hash. Arrays and hashes may in turn contain values of these same types. Returns the appropriate BIFCODE_* byte string. If $enclose is true then the result is further encoded as BIFCODE_BIFCODE.
The mapping from Perl to Bifcode2 is as follows:
'undef' maps directly to BIFCODE_UNDEF.
The
true
andfalse
values from the boolean distribution encode to BIFCODE_TRUE and BIFCODE_FALSE.A plain scalar is treated as follows:
BIFCODE_UTF8 if
utf8::is_utf8
returns true; orBIFCODE_INTEGER if it looks like a canonically represented integer; or
BIFCODE_REAL if it looks like a real number; or
BIFCODE_UTF8 if it only contains ASCII characters; or
BIFCODE_BYTES when none of the above applies.
You can force scalars to be encoded a particular way by passing a reference to them blessed as Bifcode2::BYTES, Bifcode2::INTEGER, Bifcode2::REAL or Bifcode2::UTF8. The
force_bifcode2
function below can help with creating such references.SCALAR references become BIFCODE_BYTES.
ARRAY references become BIFCODE_LIST.
HASH references become BIFCODE_DICT.
This subroutine croaks on unhandled data types.
decode_bifcode2( $string [, $max_depth ] )
Takes a byte string and returns the corresponding deserialised data structure.
If you pass an integer for the second option, it will croak when attempting to parse dictionaries nested deeper than this level, to prevent DoS attacks using maliciously crafted input.
Bifcode2 types are mapped back to Perl in the reverse way to the encode_bifcode2
function, except for:
Any scalars which were "forced" to a particular type (using blessed references) will decode as plain scalars.
BIFCODE_BIFCODE types are fully inflated into Perl structures, and not the intermediate Bifcode2 byte string.
Croaks on malformed data.
force_bifcode2( $scalar, $type )
Returns a reference to $scalar blessed as Bifcode::$TYPE. The value of $type is not checked, but the encode_bifcode2
function will only accept the resulting reference where $type is one of 'bytes', 'real', 'integer' or 'utf8'.
diff_bifcode2( $bc1, $bc2, [$diff_args] )
Returns a string representing the difference between two bifcodes. The inputs do not need to be valid Bifcode; they are only expanded with a very simple regex before the diff is done. The third argument ($diff_args
) is passed directly to Text::Diff.
Croaks if Text::Diff is not installed.
AnyEvent::Handle Support
Bifcode2 implements the AnyEvent::Handle anyevent_read_type
and anyevent_write_type
functions which allow you to do this:
$handle->push_write( 'Bifcode2' => { your => 'structure here' } );
$handle->push_read(
'Bifcode2' => sub {
my ( $hdl, $ref ) = @_;
# do stuff with $ref
},
$maxdepth # passed straight to decode_bifcode2()
);
DIAGNOSTICS
The following exceptions may be raised by Bifcode2:
- Bifcode2::Error::Decode
-
Your data is malformed in a non-identifiable way.
- Bifcode2::Error::DecodeBytes
-
Your data contains a byte string with an invalid length.
- Bifcode2::Error::DecodeBytesTrunc
-
Your data includes a byte string declared to be longer than the available data.
- Bifcode2::Error::DecodeBytesTerm
-
Your data includes a byte string that is missing a "," terminator.
- Bifcode2::Error::DecodeDepth
-
Your data contains dicts or lists that are nested deeper than the $max_depth passed to
decode_bifcode2()
. - Bifcode2::Error::DecodeTrunc
-
Your data is truncated.
- Bifcode2::Error::DecodeReal
-
Your data contained something that was supposed to be a real but didn't make sense.
- Bifcode2::Error::DecodeRealTrunc
-
Your data contains a real that is truncated.
- Bifcode2::Error::DecodeInteger
-
Your data contained something that was supposed to be an integer but didn't make sense.
- Bifcode2::Error::DecodeIntegerTrunc
-
Your data contains an integer that is truncated.
- Bifcode2::Error::DecodeKeyType
-
Your data violates the Bifcode2 format constaint that all dict keys be BIFCODE_BYTES or BIFCODE_UTF8.
- Bifcode2::Error::DecodeKeyDuplicate
-
Your data violates the Bifcode2 format constaint that all dict keys must be unique.
- Bifcode2::Error::DecodeKeyOrder
-
Your data violates the Bifcode2 format constaint that dict keys must appear in lexical sort order.
- Bifcode2::Error::DecodeKeyValue
-
Your data contains a dictionary with an odd number of elements.
- Bifcode2::Error::DecodeTrailing
-
Your data does not end after the first Bifcode2-serialised item.
- Bifcode2::Error::DecodeUTF8
-
Your data contained a UTF8 string with an invalid length.
- Bifcode2::Error::DecodeUTF8Trunc
-
Your data includes a string declared to be longer than the available data.
- Bifcode2::Error::DecodeUTF8Term
-
Your data includes a UTF8 string that is missing a "," terminator.
- Bifcode2::Error::DecodeUsage
-
You called
decode_bifcode2()
with invalid arguments. - Bifcode2::Error::DiffUsage
-
You called
diff_bifcode2()
with invalid arguments. - Bifcode2::Error::EncodeBytesUndef
-
You attempted to encode
undef
as a byte string. - Bifcode2::Error::EncodeReal
-
You attempted to encode something as a real that isn't recognised as one.
- Bifcode2::Error::EncodeRealUndef
-
You attempted to encode
undef
as a real. - Bifcode2::Error::EncodeInteger
-
You attempted to encode something as an integer that isn't recognised as one.
- Bifcode2::Error::EncodeIntegerUndef
-
You attempted to encode
undef
as an integer. - Bifcode2::Error::EncodeUTF8Undef
-
You attempted to encode
undef
as a UTF8 string. - Bifcode2::Error::EncodeUnhandled
-
You are trying to serialise a data structure that contains a data type not supported by the Bifcode2 format.
- Bifcode2::Error::EncodeUsage
-
You called
encode_bifcode2()
with invalid arguments. - Bifcode2::Error::ForceUsage
-
You called
force_bifcode2()
with invalid arguments.
BUGS AND LIMITATIONS
Strings and numbers are practically indistinguishable in Perl, so encode_bifcode2()
has to resort to a heuristic to decide how to serialise a scalar. This cannot be fixed.
SEE ALSO
This distribution includes the diff-bifcode2 command-line utility for comparing Bifcode2 in files.
Bifcode implements the original (experimental) encoding.
AUTHOR
Mark Lawrence <nomad@null.net>, heavily based on Bencode by Aristotle Pagaltzis <pagaltzis@gmx.de>
COPYRIGHT AND LICENSE
This software is copyright (c):
2015 by Aristotle Pagaltzis
2017-2022 by Mark Lawrence.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.