Perl-BSON Type Mapping

This section describes desired BSON/Perl mapping for both encoding and decoding.

On the left are all the Perl types or classes the BSON codec knows how to serialize to BSON. The middle column is the BSON type for each class. The right-most column is the Perl type or class that the BSON type deserializes to. Footnotes indicate variations or special behaviors.

Perl type/class ->          BSON type        -> Perl type/class
-------------------------------------------------------------------
float[1]                    0x01 DOUBLE         float[2]
BSON::Double
-------------------------------------------------------------------
string[3]                   0x02 UTF8           string[2]
BSON::String
-------------------------------------------------------------------
hashref                     0x03 DOCUMENT       hashref[4][5]
BSON::Doc
BSON::Raw
MongoDB::BSON::Raw[d]
Tie::IxHash
Hash::Ordered
-------------------------------------------------------------------
arrayref                    0x04 ARRAY          arrayref
-------------------------------------------------------------------
BSON::Bytes                 0x05 BINARY         BSON::Bytes
scalarref
BSON::Binary[d]
MongoDB::BSON::Binary[d]
-------------------------------------------------------------------
n/a                         0x06 UNDEFINED[d]   undef
-------------------------------------------------------------------
BSON::OID                   0x07 OID            BSON::OID
BSON::ObjectId[d]
MongoDB::OID[d]
-------------------------------------------------------------------
boolean                     0x08 BOOL           boolean
BSON::Bool[d]
JSON::XS::Boolean
JSON::PP::Boolean
JSON::Tiny::_Bool
Mojo::JSON::_Bool
Cpanel::JSON::XS::Boolean
Types::Serialiser::Boolean
-------------------------------------------------------------------
BSON::Time                  0x09 DATE_TIME      BSON::Time
DateTime
DateTime::Tiny
Time::Moment
-------------------------------------------------------------------
undef                       0x0a NULL           undef
-------------------------------------------------------------------
BSON::Regex                 0x0b REGEX          BSON::Regex
qr// reference
MongoDB::BSON::Regexp[d]
-------------------------------------------------------------------
n/a                         0x0c DBPOINTER[d]   BSON::DBRef
-------------------------------------------------------------------
BSON::Code[6]               0x0d CODE           BSON::Code
MongoDB::Code[6]
-------------------------------------------------------------------
n/a                         0x0e SYMBOL[d]      string
-------------------------------------------------------------------
BSON::Code[6]               0x0f CODEWSCOPE     BSON::Code
MongoDB::Code[6]
-------------------------------------------------------------------
integer[7][8]               0x10 INT32          integer[2]
BSON::Int32
-------------------------------------------------------------------
BSON::Timestamp             0x11 TIMESTAMP      BSON::Timestamp
MongoDB::Timestamp[d]
-------------------------------------------------------------------
integer[7]                  0x12 INT64          integer[2][9]
BSON::Int64
Math::BigInt
Math::Int64
-------------------------------------------------------------------
BSON::MaxKey                0x7F MAXKEY         BSON::MaxKey
MongoDB::MaxKey[d]
-------------------------------------------------------------------
BSON::MinKey                0xFF MINKEY         BSON::MinKey
MongoDB::MinKey[d]

[d] Deprecated or soon to be deprecated.
[1] Scalar with "NV" internal representation no "PV"
    representation, or a string that looks like a float if the
    'prefer_numeric' option is true.
[2] If the 'wrap_numbers' option is true, numeric types will be wrapped
    as BSON::Double, BSON::Int32 or BSON::Int64 as appropriate to ensure
    round-tripping. If the 'wrap_strings' option is true, strings will
    be wrapped as BSON::String, likewise.
[3] Scalar with "PV" representation and not identified as a number
    by notes [1] or [7].
[4] If 'ordered' option is set, will return a tied hash that preserves
    order (deprecated 'ixhash' option still works).
[5] If the document appears to contain a DBRef and a 'dbref_callback'
    exists, that callback is executed with the deserialized document.
[6] Code is serialized as CODE or CODEWSCOPE depending on whether a
    scope hashref exists in BSON::Code/MongoDB::Code.
[7] Scalar with "IV" internal representation no "PV"
    representation, or a string that looks like an integer if the
    'prefer_numeric' option is true.
[8] Only if the integer fits in 32 bits.
[9] On 32-bit platforms, 64-bit integers are deserialized to
    Math::BigInt objects (even if subsequently wrapped into
    BSON::Int64 if 'wrap_scalars' is true).

Type map hooks (Not yet implemented)

Users may need to be able to specify hook functions to customize serialization and deserialization. This section describes a possible design for this feature.

There are three possible types of hooks for serializing and deserializing: key-specific, type-specific and generic.

Doing key-specific hooks correctly really requires maintaining a deep key representation, which currently doesn't exist. Precedence vs type-specific keys is also unclear. Therefore, this is out of scope.

Type-specific hooks are registered based on type: for serializing, the result of the ref call; for deserializing, the BSON type. Generic hooks always run for every element encoded or decoded (unless a type-specific hook applies); they are discouraged due to the overhead this causes.

Serialization hooks

Serialization hooks fire early in the encode process, before dispatching based on a value's type. The hook receives the key and value (or array index and value). It must return a new key/value pair if it modifies either element (it must not modify an array index). It must return an empty list if it makes no changes. If a type changes and there is a hook for the new type, the new key/value are re-hooked.

Assuming a generic hook is defined as "type" of *, the logic in the BSON encode function would resemble the following:

# Given that $key, $value exist
my $type = ref($value);

HOOK: {
    my ($old_type, $hook, @repl) = $type;
    if ( $hook = $E_HOOKS{$type} and @repl = $hook->( $key, $value ) ) {
        my $old_type = $type;
        ( $key, $value, $type ) = @repl, ref( $repl[1] );
        redo HOOK if $type ne $old_type and exists $E_HOOKS{$type};
    }
    elsif ( $hook = $E_HOOKS{'*'} and @repl = $hook->( $key, $value ) ) {
        # this branch is separate so it never runs after redo HOOK
        my $old_type = $type;
        ( $key, $value, $type ) = @repl, ref( $repl[1] );
        redo HOOK if $type ne $old_type and exists $E_HOOKS{$type};
    }
}

After hooks have run, if any, the value must be one of the types that BSON knows how to serialize.

Deserialization hooks

Deserialization hooks fire at the end of the decoding process. BSON first decodes a BSON field to its default Perl type. The hook receives the key, the BSON type and the value. It must return a new key/value pair if it modifies either element (it must not modify an array index). It must return an empty list if it makes no changes.

Assuming a generic hook is defined as "type" of *, the logic in the BSON decode function would resemble the following:

# Given that $bson_type, $key, $value exist

if (    my $hook = $D_HOOKS{$bson_type} || $D_HOOKS{'*'}
    and my @repl = $hook->( $bson_type, $key, $value ) )
{
    ( $key, $value ) = @repl;
}

After a hook has run, the key and value are stored in the parent document in the usual fashion.