Perl-BSON Type Mapping
This section describes desired BSON/Perl mapping for both encoding and decoding.
On the left are all the Perl types or classes the BSON codec knows how to serialize to BSON. The middle column is the BSON type for each class. The right-most column is the Perl type or class that the BSON type deserializes to. Footnotes indicate variations or special behaviors.
Perl type/class -> BSON type -> Perl type/class
-------------------------------------------------------------------
float[1] 0x01 DOUBLE float[2]
BSON::Double
-------------------------------------------------------------------
string[3] 0x02 UTF8 string[2]
BSON::String
-------------------------------------------------------------------
hashref 0x03 DOCUMENT hashref[4][5]
BSON::Doc
BSON::Raw
MongoDB::BSON::Raw[d]
Tie::IxHash
Hash::Ordered
-------------------------------------------------------------------
arrayref 0x04 ARRAY arrayref
-------------------------------------------------------------------
BSON::Bytes 0x05 BINARY BSON::Bytes
scalarref
BSON::Binary[d]
MongoDB::BSON::Binary[d]
-------------------------------------------------------------------
n/a 0x06 UNDEFINED[d] undef
-------------------------------------------------------------------
BSON::OID 0x07 OID BSON::OID
BSON::ObjectId[d]
MongoDB::OID[d]
-------------------------------------------------------------------
boolean 0x08 BOOL boolean
BSON::Bool[d]
JSON::XS::Boolean
JSON::PP::Boolean
JSON::Tiny::_Bool
Mojo::JSON::_Bool
Cpanel::JSON::XS::Boolean
Types::Serialiser::Boolean
-------------------------------------------------------------------
BSON::Time 0x09 DATE_TIME BSON::Time
DateTime
DateTime::Tiny
Time::Moment
-------------------------------------------------------------------
undef 0x0a NULL undef
-------------------------------------------------------------------
BSON::Regex 0x0b REGEX BSON::Regex
qr// reference
MongoDB::BSON::Regexp[d]
-------------------------------------------------------------------
n/a 0x0c DBPOINTER[d] BSON::DBRef
-------------------------------------------------------------------
BSON::Code[6] 0x0d CODE BSON::Code
MongoDB::Code[6]
-------------------------------------------------------------------
n/a 0x0e SYMBOL[d] string
-------------------------------------------------------------------
BSON::Code[6] 0x0f CODEWSCOPE BSON::Code
MongoDB::Code[6]
-------------------------------------------------------------------
integer[7][8] 0x10 INT32 integer[2]
BSON::Int32
-------------------------------------------------------------------
BSON::Timestamp 0x11 TIMESTAMP BSON::Timestamp
MongoDB::Timestamp[d]
-------------------------------------------------------------------
integer[7] 0x12 INT64 integer[2][9]
BSON::Int64
Math::BigInt
Math::Int64
-------------------------------------------------------------------
BSON::MaxKey 0x7F MAXKEY BSON::MaxKey
MongoDB::MaxKey[d]
-------------------------------------------------------------------
BSON::MinKey 0xFF MINKEY BSON::MinKey
MongoDB::MinKey[d]
[d] Deprecated or soon to be deprecated.
[1] Scalar with "NV" internal representation no "PV"
representation, or a string that looks like a float if the
'prefer_numeric' option is true.
[2] If the 'wrap_numbers' option is true, numeric types will be wrapped
as BSON::Double, BSON::Int32 or BSON::Int64 as appropriate to ensure
round-tripping. If the 'wrap_strings' option is true, strings will
be wrapped as BSON::String, likewise.
[3] Scalar with "PV" representation and not identified as a number
by notes [1] or [7].
[4] If 'ordered' option is set, will return a tied hash that preserves
order (deprecated 'ixhash' option still works).
[5] If the document appears to contain a DBRef and a 'dbref_callback'
exists, that callback is executed with the deserialized document.
[6] Code is serialized as CODE or CODEWSCOPE depending on whether a
scope hashref exists in BSON::Code/MongoDB::Code.
[7] Scalar with "IV" internal representation no "PV"
representation, or a string that looks like an integer if the
'prefer_numeric' option is true.
[8] Only if the integer fits in 32 bits.
[9] On 32-bit platforms, 64-bit integers are deserialized to
Math::BigInt objects (even if subsequently wrapped into
BSON::Int64 if 'wrap_scalars' is true).
Type map hooks (Not yet implemented)
Users may need to be able to specify hook functions to customize serialization and deserialization. This section describes a possible design for this feature.
There are three possible types of hooks for serializing and deserializing: key-specific, type-specific and generic.
Doing key-specific hooks correctly really requires maintaining a deep key representation, which currently doesn't exist. Precedence vs type-specific keys is also unclear. Therefore, this is out of scope.
Type-specific hooks are registered based on type: for serializing, the result of the ref
call; for deserializing, the BSON type. Generic hooks always run for every element encoded or decoded (unless a type-specific hook applies); they are discouraged due to the overhead this causes.
Serialization hooks
Serialization hooks fire early in the encode process, before dispatching based on a value's type. The hook receives the key and value (or array index and value). It must return a new key/value pair if it modifies either element (it must not modify an array index). It must return an empty list if it makes no changes. If a type changes and there is a hook for the new type, the new key/value are re-hooked.
Assuming a generic hook is defined as "type" of *
, the logic in the BSON encode function would resemble the following:
# Given that $key, $value exist
my $type = ref($value);
HOOK: {
my ($old_type, $hook, @repl) = $type;
if ( $hook = $E_HOOKS{$type} and @repl = $hook->( $key, $value ) ) {
my $old_type = $type;
( $key, $value, $type ) = @repl, ref( $repl[1] );
redo HOOK if $type ne $old_type and exists $E_HOOKS{$type};
}
elsif ( $hook = $E_HOOKS{'*'} and @repl = $hook->( $key, $value ) ) {
# this branch is separate so it never runs after redo HOOK
my $old_type = $type;
( $key, $value, $type ) = @repl, ref( $repl[1] );
redo HOOK if $type ne $old_type and exists $E_HOOKS{$type};
}
}
After hooks have run, if any, the value must be one of the types that BSON knows how to serialize.
Deserialization hooks
Deserialization hooks fire at the end of the decoding process. BSON first decodes a BSON field to its default Perl type. The hook receives the key, the BSON type and the value. It must return a new key/value pair if it modifies either element (it must not modify an array index). It must return an empty list if it makes no changes.
Assuming a generic hook is defined as "type" of *
, the logic in the BSON decode function would resemble the following:
# Given that $bson_type, $key, $value exist
if ( my $hook = $D_HOOKS{$bson_type} || $D_HOOKS{'*'}
and my @repl = $hook->( $bson_type, $key, $value ) )
{
( $key, $value ) = @repl;
}
After a hook has run, the key and value are stored in the parent document in the usual fashion.