NAME

BLOB - Perl extension for explicitly marking binary strings

SYNOPSIS

use BLOB;

mark_blob($jpeg_data);

if (is_blob $jpeg_data) {
    ...
} else {
    ...
}

my $bytes = is_blob($foo) ? $foo : encode_utf8($foo);

DESCRIPTION

In general it is better if text operations and binary operations are separated into different functions.

But sometimes a single function needs to support both text strings and binary strings. Because the two string types are fundamentally different, it may be necessary for the function to know what it is dealing with.

This package aims to be the single way of indicating that a string is binary, not text. Now CPAN module authors don't have to reinvent this wheel, and module users do not have to learn a plethora of different syntaxes.

The name BLOB historically stands for Binary Large OBject, but small strings are of course also supported.

BLOB supports Perl versions all the way back to 5.0 and has no external dependencies.

FUNCTION INTERFACE

The function interface provides the basic interface. The following functions are provided by this module and exported by default:

mark_blob($string)

Marks the string as a blob. The string can be used as before; it should be safe to mark strings as blobs in existing code.

Note that a copy of a blob is not marked automatically.

is_blob($string)

Returns true if the string is a blob, false if the string is not a blob.

OBJECT INTERFACE

The function interface provides basic functionality, but no means of extension. With the object interface, class inheritance can be used to provide additional functionality.

To use a BLOB as an object, use the special syntax (\$blob). Because $blob remains a normal variable that can be used like any other string, the \ is needed to indicate that you're going to use it as an object.

Before using a BLOB as an object, test that it actually is a blob with the is_blob function, because a normal non-BLOB string cannot be used as an object.

if (is_blob $foo) {
    (\$foo)->method(...);
    # Without the is_blob test, this would fail if $foo is a normal string
}
BLOB->mark($string)

This is the same as mark_blob, but extensible with subtypes. It is possible that a certain kind of BLOB has a constructor (typically called new) instead of mark.

Blesses $string and returns a reference to it.

(\$string)->isa($class)

(Inherited from UNIVERSAL.) Returns true if the BLOB belongs to a certain class.

(\$string)->can($method)

(Inherited from UNIVERSAL.) Returns a code reference to the method if it is supported, or undef otherwise.

The base BLOB implementation does not provide any object methods except those provided by Perl's UNIVERSAL class. Decorated BLOBs might provide additional methods.

USING ALIASING

There are two compelling reasons to use aliasing techniques with blobs, instead of copying the values like normal variable assignment does. One is that blobs can get very large, and copying large values imposes a big memory and performance penalty. The other is that the indication that something is a blob, as set by mark_blob or BLOB->mark, is not retained in a copy.

There are several ways of addressing the actual variable instead of copying it:

Use the alias in the @_ array.
my ($self, undef, $arg_2) = @_;
if (is_blob $_[1]) { ... }
Use Data::Alias
alias my ($self, $string, $arg_2) = @_;
if (is_blob $string) { ... }
Use a reference to the alias in @_
my ($self, undef, $arg_2) = @_;
my $string_ref = \$_[1];
if (is_blob $$string_ref) { ... }
Require that the variable is passed with an explicit reference
my ($self, $$string_ref, $arg_2) = @_;
if (is_blob $$string_ref) { ... }

In any case, the following won't work:

my ($self, $string, $arg_2) = @_;
if (is_blob $string) { ... }

This does not work because $string is a copy, and copies don't automatically get the BLOB mark. =head1 PROGRAMMING LOGIC ERRORS

Byte operations should be separated from text operations in programming, with only explicit conversion (through decoding and encoding) allowed between them.

Perl programmers who fail to do this, might end up with characters greater than 255 in their byte strings. Because a byte can only store a value in the 0..255 range, a string with a character greater than 255 cannot be used as a byte string.

Also, for efficiency and compatibility with older Perl modules, the functions provided by this module downgrade strings to ensure that the internal representation is a raw octet sequence.

DIAGNOSTICS

This module can produce the following warnings:

Wide character outside byte range in BLOB, encoding data with UTF-8

A string with at least one character greater than 255 was marked as BLOB. Because a byte cannot hold a value greater than 255, the string was changed to its UTF-8 encoding to allow further binary data processing.

Find out why this character got into this string, and repair the programming logic error.

If the warning is reported in the module you are using, set $Carp::Verbose = 1 for a stack trace.

CAVEATS

Marking as a BLOB is done by blessing the string. Do not bless the string again. Blessing existing binary strings is extremely uncommon, but not impossible.

TO DO

  • It would be nice if BLOB would intercept internal string encoding upgrades, and downgrade immediately. This would allow a warning to be emitted at the point where the source of the problem is, making debugging unintended text+binary concatenations easier.

  • Compose a document that describes the best practices for documenting modules that specifically support marked blobs.

AUTHOR

Juerd Waalboer <#####@juerd.nl>

1 POD Error

The following errors were encountered while parsing the POD:

Around line 232:

Can't have a 0 in =over 0