NAME
Qstruct - Qstruct perl interface
SYNOPSIS
use Qstruct;
Qstruct::load_schema(q{
## This is my schema
qstruct MyPkg::PhoneNumber {
number @0 string;
ext @1 uint8;
}
qstruct MyPkg::User {
id @0 uint64;
name @1 string;
is_admin @3 bool;
is_moderator @4 bool;
emails @2 string[];
account_ids @5 uint64[];
phones @7 MyPkg::PhoneNumber[];
sha256_hash @6 uint8[32];
}
});
## Build a new user message:
my $message = MyPkg::User->encode({
name => "jimmy",
id => 100,
is_admin => 1,
emails => [ 'jimmy@example.com', 'jim@jimmy.com' ],
sha256_hash => "\xFF"x32,
phones => [
{ number => '555-1212' },
{ number => '1234567', ext => 2 },
],
});
## Load a user message:
my $user = MyPkg::User->decode($message);
## Scalar accessors:
print "User id: " . $user->id . "\n";
print "User name: " . $user->name . "\n";
print "*** ADMIN ***\n" if $user->is_admin;
print "1st phone #: " . $user->phones->[0]->number . "\n";
## Zero-copy access to strings/blobs:
$user->name(my $name);
## Zero-copy array iteration:
$user->emails->foreach(sub {
print "EMAIL is ", $_[0], "\n";
});
## Zero-copy nested qstructs:
$user->phones->foreach(sub {
$_[0]->number(my $number);
print $number, "\n";
});
DESCRIPTION
Qstruct is a binary serialisation format that requires a schema. This
documentation describes the Qstruct perl module which is the reference
dynamic-language implementation for qstructs. The specification for the
qstruct format is documented here: Qstruct::Spec.
Because in qstructs the "wire" and "in-memory" formats are the same, the
"encode" and "decode" functions are somewhat mis-named. As soon as the
object is built in memory it is ready to be copied out to disk or the
network. Also, as soon as it is read or mapped into memory it is ready
for accessing. So the "encode" and "decode" operations are mostly
no-ops.
This module is designed to be particularly efficient for reading
qstructs. Numerics, strings, blobs, nested qstructs, and arrays of these
types can all be randomly-accessed or iterated over without reading or
parsing any unrelated parts of the message (qstructs are lazy).
Furthermore, all copies of message data can be avoided -- only pointers
into the message memory are recorded (qstructs are zero-copy).
The encoder in this module is not exactly slow, it just does more
memory-allocations and copying than an optimised implementation would.
The compiled static interface will probably be optimised for encoding
eventually.
ZERO-COPY
As shown in the synopsis, fields can be accessed simply by calling their
corresponding methods on the objects representing decoded messages:
## Field access (copying)
my $name = $user->name;
However, due to the semantics of return values in perl, the above line
of code allocates new memory and copies the "name" field into it. This
is inefficient for two reasons.
Firstly, the process of copying takes time. This time is proportional to
how large the data is. Often this copying is unnecessary and therefore
an inefficient use of time.
Secondly, copying is inefficient because impacts your memory system. If
you aren't copying the data, you aren't paging it in from disk, pulling
it into your filesystem/CPU caches, pushing other things out of cache,
or exercising your CPU's translation lookaside buffer (TLB).
Qstruct is always lazy when it comes to memory access: It will only
access the bare-minimum memory required to fulfill accessor requests.
If you wish to avoid copying however, you need to pass an "output
scalar" into the accessor method:
## Field access (zero-copy)
$user->name(my $name);
Passing these output scalars into methods to avoid copying is a common
theme throughtout the Qstruct perl module interface.
This module is designed to work with modules like File::Map which map
files into perl strings without actually copying them into memory, and
also with modules like LMDB_File which interact with transactional
in-process databases that support zero-copy. When combining Qstructs
with these modules you can have true zero-copy access to a filesystem or
database from your high-level perl code just as conveniently as with
copying interfaces.
For more information on zero-copy, see the Test::ZeroCopy module and the
"t/zerocopy.t" test in this distribution that uses it.
ARRAYS
When you call the accessor method on an array it returns a special
overloaded object of type "Qstruct::ArrayRef". This object can
(obviously) be accessed as an array reference:
## Array random access (copying)
my $first_email = $user->emails->[0];
Because of the lazy-loading nature of Qstructs, in the above code none
of the other emails are accessed at all. If the message is in a
memory-mapped file, the other emails might never even get paged in to
memory (although emails are generally small enough that they many of
them can be stored together on the same page).
Of course references can also be de-referenced and iterated over:
## Array iteration (copying)
foreach my $email (@{ $user->emails }) {
print "Email: ", $email, "\n";
}
The problem with the above approach is that while the elements are
lazy-loaded, they are not zero-copy. In other words, for the elements
iterated over, perl is allocating new memory for them and then they are
being copied into it.
In addition to acting as array refs, "Qstruct::ArrayRef" objects are
also special objects with additional methods. The "get" method is
similar to the random-access de-reference operation above except that
you can pass an output scalar to it to get zero-copy behaviour:
## Array random access (zero-copy)
$user->emails->get(0, my $first_email);
Because the "my $first_email" scalar is passed in, the "get" method will
populate it with a pointer into the underlying message-memory owned by
the $user object.
There is also a "len" method which of course means you can iterate over
arrays:
## Array iteration (zero-copy)
my $emails = $user->emails;
for(my $i=0; $i < $emails->len; $i++) {
$emails->get($i, my $email);
print "Email: ", $email, "\n";
}
There is a short-cut "foreach" method that simplifies the above pattern:
## Array iteration short-cut (zero-copy)
$user->emails->foreach(sub {
print "Email: ", $_[0], "\n";
});
Arrays of qstructs work essentially the same as arrays of primitive
types except that the elements are decoded objects convenient for
traversal, ie:
## Arrays of qstructs
$department->staff->employees->foreach(sub {
my $employee = shift;
print "Employee id: ", $employee->id, "\n";
print "Employee name: ", $employee->name, "\n";
});
RAW ARRAY ACCESS
For fixed arrays of numeric types there are also raw accessors. For
example, hash values are known-length values so it can make sense for
them to be fixed arrays which are inlined in the message body for
efficiency (see Qstruct::Spec for details). Such arrays are most likely
best accessed with raw accessors:
## Whole-array access (copying)
my $hash_value = $user->sha256_hash->raw;
Of course there is a corresponding zero-copy interface:
## Whole-array access (zero-copy)
$user->sha256_hash->raw(my $hash_value);
When encoding messages, you can simply pass in an appropriately sized
string and it will be treated as raw:
my $msg = MyPkg::User->encode({
sha256_hash => Digest::SHA::sha256("whatever"),
});
Numeric values are stored in little-endian format so if you use raw
accessors on arrays with elements of more than 2 byte sizes then you
will need to "pack" and "unpack" them in order for your code to be
portable.
Also, fixed arrays are more limited than dynamic arrays in that the
schema can't be evolved by converting them into arrays of nested
qstructs.
Because of the portability and schema evolution restrictions, fixed
arrays and raw array access are usually recommended against.
EXCEPTIONS
This module will throw exceptions in the following conditions:
* Schema parse errors
* Decoding or accessing truncated/malformed qstructs
* Out of memory during encoding
* You are on a 32-bit system and you attempt to access
a field that can't fit in your address space
* Trying to set an array from a raw buffer that is the
incorrect size
* Attempting to modify a Qstruct::Array
Note that if fields aren't set, accessing them will *not* throw
exceptions. Instead, accessors will return the default values of their
respective types (see Qstruct::Spec). This is so that you can still
parse old messages that were created with old versions of a schema.
PORTABILITY
This module uses the "slow" but portable accessors described in
work on any machine regardless of byte order or alignment requirements.
Despite the name, these accessors are not actually slow relative to the
overhead of making a perl function or method call so there is little
point in optimising them for the perl module.
Because the perl module uses the slow and portable accessors, no matter
what CPU you use you do not need to worry about loading messages from
aligned offsets. When using the C API, if you choose to compile with the
non-portable accessors you should be aware that depending on your CPU
you may have reliabilty or performance issues if you load messages from
non-aligned offsets. However, modern x86-64 CPUs are perfectly suited
for the "fast" interface and this interface can be used without
sacrificing reliability or performance even with non-aligned messages.
SEE ALSO
Qstruct::Spec - The Qstruct design objectives and format specification
Video: Doug Hoyte introduces Qstruct to Toronto Perl Mongers
Qstruct::Compiler - The reference compiler implementation
Test::ZeroCopy - More information on zero-copy and how it is tested for
AUTHOR
Doug Hoyte, "<doug@hcsw.org>"
COPYRIGHT & LICENSE
Copyright 2014 Doug Hoyte.
This module is licensed under the same terms as perl itself.
The bundled "libqstruct" is (C) Doug Hoyte and licensed under the
2-clause BSD license.