NAME

Qstruct - Qstruct perl interface

SYNOPSIS

use Qstruct;

Qstruct::load_schema(q{
  ## This is my schema

  qstruct MyPkg::PhoneNumber {
    number @0 string;
    ext @1 uint8;
  }

  qstruct MyPkg::User {
    id @0 uint64;
    name @1 string;

    is_admin @3 bool;
    is_moderator @4 bool;

    emails @2 string[];
    account_ids @5 uint64[];
    phones @7 MyPkg::PhoneNumber[];

    sha256_hash @6 uint8[32];
  }

});

## Build a new user message:
my $message = MyPkg::User->encode({
                name => "jimmy",
                id => 100,
                is_admin => 1,
                emails => [ 'jimmy@example.com', 'jim@jimmy.com' ],
                sha256_hash => "\xFF"x32,
                phones => [
                            { number => '555-1212' },
                            { number => '1234567', ext => 2 },
                          ],
              });

## Load a user message:
my $user = MyPkg::User->decode($message);

## Scalar accessors:
print "User id: " . $user->id . "\n";
print "User name: " . $user->name . "\n";
print "*** ADMIN ***\n" if $user->is_admin;
print "1st phone #: " . $user->phones->[0]->number . "\n";

## Zero-copy access to strings/blobs:
$user->name(my $name);

## Zero-copy array iteration:
$user->emails->foreach(sub {
  print "EMAIL is ", $_[0], "\n";
});

## Zero-copy nested qstructs:
$user->phones->foreach(sub {
  $_[0]->number(my $number);
  print $number, "\n";
});

DESCRIPTION

Qstruct is a binary serialisation format that requires a schema. This documentation describes the Qstruct perl module which is the reference dynamic-language implementation for qstructs. The specification for the qstruct format is documented here: Qstruct::Spec.

Because in qstructs the "wire" and "in-memory" formats are the same, the encode and decode functions are somewhat mis-named. As soon as the object is built in memory it is ready to be copied out to disk or the network. Also, as soon as it is read or mapped into memory it is ready for accessing. So the encode and decode operations are mostly no-ops.

This module is designed to be particularly efficient for reading qstructs. Numerics, strings, blobs, nested qstructs, and arrays of these types can all be randomly-accessed or iterated over without reading or parsing any unrelated parts of the message (qstructs are lazy). Furthermore, all copies of message data can be avoided -- only pointers into the message memory are recorded (qstructs are zero-copy).

The encoder in this module is not exactly slow, it just does more memory-allocations and copying than an optimised implementation would. The compiled static interface will probably be optimised for encoding eventually.

ZERO-COPY

As shown in the synopsis, fields can be accessed simply by calling their corresponding methods on the objects representing decoded messages:

## Field access (copying)

my $name = $user->name;

However, due to the semantics of return values in perl, the above line of code allocates new memory and copies the name field into it. This is inefficient for two reasons.

Firstly, the process of copying takes time. This time is proportional to how large the data is. Often this copying is unnecessary and therefore an inefficient use of time.

Secondly, copying is inefficient because impacts your memory system. If you aren't copying the data, you aren't paging it in from disk, pulling it into your filesystem/CPU caches, pushing other things out of cache, or exercising your CPU's translation lookaside buffer (TLB).

Qstruct is always lazy when it comes to memory access: It will only access the bare-minimum memory required to fulfill accessor requests.

If you wish to avoid copying however, you need to pass an "output scalar" into the accessor method:

## Field access (zero-copy)

$user->name(my $name);

Passing these output scalars into methods to avoid copying is a common theme throughtout the Qstruct perl module interface.

This module is designed to work with modules like File::Map which map files into perl strings without actually copying them into memory, and also with modules like LMDB_File which interact with transactional in-process databases that support zero-copy. When combining Qstructs with these modules you can have true zero-copy access to a filesystem or database from your high-level perl code just as conveniently as with copying interfaces.

For more information on zero-copy, see the Test::ZeroCopy module and the t/zerocopy.t test in this distribution that uses it.

ARRAYS

When you call the accessor method on an array it returns a special overloaded object of type Qstruct::ArrayRef. This object can (obviously) be accessed as an array reference:

## Array random access (copying)

my $first_email = $user->emails->[0];

Because of the lazy-loading nature of Qstructs, in the above code none of the other emails are accessed at all. If the message is in a memory-mapped file, the other emails might never even get paged in to memory (although emails are generally small enough that they many of them can be stored together on the same page).

Of course references can also be de-referenced and iterated over:

## Array iteration (copying)

foreach my $email (@{ $user->emails }) {
  print "Email: ", $email, "\n";
}

The problem with the above approach is that while the elements are lazy-loaded, they are not zero-copy. In other words, for the elements iterated over, perl is allocating new memory for them and then they are being copied into it.

In addition to acting as array refs, Qstruct::ArrayRef objects are also special objects with additional methods. The get method is similar to the random-access de-reference operation above except that you can pass an output scalar to it to get zero-copy behaviour:

## Array random access (zero-copy)

$user->emails->get(0, my $first_email);

Because the my $first_email scalar is passed in, the get method will populate it with a pointer into the underlying message-memory owned by the $user object.

There is also a len method which of course means you can iterate over arrays:

## Array iteration (zero-copy)

my $emails = $user->emails;

for(my $i=0; $i < $emails->len; $i++) {
  $emails->get($i, my $email);
  print "Email: ", $email, "\n";
}

There is a short-cut foreach method that simplifies the above pattern:

## Array iteration short-cut (zero-copy)

$user->emails->foreach(sub {
  print "Email: ", $_[0], "\n";
});

Arrays of qstructs work essentially the same as arrays of primitive types except that the elements are decoded objects convenient for traversal, ie:

## Arrays of qstructs
$department->staff->employees->foreach(sub {
  my $employee = shift;
  print "Employee id: ", $employee->id, "\n";
  print "Employee name: ", $employee->name, "\n";
});

RAW ARRAY ACCESS

For fixed arrays of numeric types there are also raw accessors. For example, hash values are known-length values so it can make sense for them to be fixed arrays which are inlined in the message body for efficiency (see Qstruct::Spec for details). Such arrays are most likely best accessed with raw accessors:

## Whole-array access (copying)

my $hash_value = $user->sha256_hash->raw;

Of course there is a corresponding zero-copy interface:

## Whole-array access (zero-copy)

$user->sha256_hash->raw(my $hash_value);

When encoding messages, you can simply pass in an appropriately sized string and it will be treated as raw:

my $msg = MyPkg::User->encode({
  sha256_hash => Digest::SHA::sha256("whatever"),
});

Numeric values are stored in little-endian format so if you use raw accessors on arrays with elements of more than 2 byte sizes then you will need to pack and unpack them in order for your code to be portable.

Also, fixed arrays are more limited than dynamic arrays in that the schema can't be evolved by converting them into arrays of nested qstructs.

Because of the portability and schema evolution restrictions, fixed arrays and raw array access are usually recommended against.

EXCEPTIONS

This module will throw exceptions in the following conditions:

* Schema parse errors

* Decoding or accessing truncated/malformed qstructs

* Out of memory during encoding

* You are on a 32-bit system and you attempt to access
  a field that can't fit in your address space

* Trying to set an array from a raw buffer that is the
  incorrect size

* Attempting to modify a Qstruct::Array

Note that if fields aren't set, accessing them will not throw exceptions. Instead, accessors will return the default values of their respective types (see Qstruct::Spec). This is so that you can still parse old messages that were created with old versions of a schema.

PORTABILITY

This module uses the "slow" but portable accessors described in libqstruct meaning it should work on any machine regardless of byte order or alignment requirements. Despite the name, these accessors are not actually slow relative to the overhead of making a perl function or method call so there is little point in optimising them for the perl module.

Because the perl module uses the slow and portable accessors, no matter what CPU you use you do not need to worry about loading messages from aligned offsets. When using the C API, if you choose to compile with the non-portable accessors you should be aware that depending on your CPU you may have reliabilty or performance issues if you load messages from non-aligned offsets. However, modern x86-64 CPUs are perfectly suited for the "fast" interface and this interface can be used without sacrificing reliability or performance even with non-aligned messages.

SEE ALSO

Qstruct::Spec - The Qstruct design objectives and format specification

Video: Doug Hoyte introduces Qstruct to Toronto Perl Mongers

Qstruct::Compiler - The reference compiler implementation

Test::ZeroCopy - More information on zero-copy and how it is tested for

libqstruct - Shared C library

Qstruct github repo

AUTHOR

Doug Hoyte, <doug@hcsw.org>

COPYRIGHT & LICENSE

Copyright 2014 Doug Hoyte.

This module is licensed under the same terms as perl itself.

The bundled libqstruct is (C) Doug Hoyte and licensed under the 2-clause BSD license.