NAME

Convert::Base81 - Encoding and decoding to and from Base 81 strings

SYNOPSIS

   use Convert::Base81;

   my $encoded = Convert::Base81::encode($data);
   my $decoded = Convert::Base81::decode($encoded);

or

   use Convert::Base81 qw(base81_encode base81_decode);

   my $encoded = base81_encode($data);
   my $decoded = base81_decode($encoded);

DESCRIPTION

This module implements a Base81 conversion for encoding binary data as text. This is done by interpreting each group of fifteen bytes as a 120-bit integer, which is then converted to a seventeen-digit base 81 representation using the alphanumeric characters 0-9, A-Z, and a-z, in addition to the punctuation characters !, #, $, %, (, ), *, +, -, ;, =, ?, @, ^, _, {, |, }, and ~, in that order, characters that are safe to use in JSON and XML formats.

This creates a string that is (1.2666) larger than the original data, making it more efficient than MIME::Base64's 3-to-4 ratio (1.3333) but slightly less so than the efficiency of Convert::Ascii85's 4-to-5 ratio (1.25).

It does have the advantage of a natural ternary system: if your data is composed of only three, or nine, or twenty-seven distinct values, its size can be compressed instead of expanded, and this module has functions that will do that.

use Convert::Base81 qw(b3_pack81 b3_unpack81);

my $input_string = q(rrgrbgggggrrgbrrbbbbrbrgggrggggg);
my $b81str = b3_pack81("rgb", $input_string);

The returned string will be one-fourth the size of the original. Equivalent functions exist for 9-digit and 27-digit values, which will return strings one-half and three-fourths the size of the original, respectively.

FUNCTIONS

base81_check

Examine a string for characters that fall outside the Base 81 character set.

Returns the first character position that fails the test, or -1 if no characters fail.

if (my $d = base81_check($base81str) != -1)
{
    carp "Incorrect character at position $d; cannot decode input string";
    return undef;
}

base81_encode

Convert::Base81::encode

Converts input data to Base81 test.

This function may be exported as base81_encode into the caller's namespace.

my $datalen = length($data);
my $encoded = base81_encode($data); 

Or, if you want to have managable lines, read 45 bytes at a time and write 57-character lines (remembering that encode() takes 15 bytes at a time and encodes to 19 bytes). Remember to save the original length in case the data had to be padded out to a multiple of 15.

base81_decode

Convert::Base81::decode

Converts the Base81-encoded string back to bytes. Any spaces, linebreaks, or other whitespace are stripped from the string before decoding.

This function may be exported as base81_decode into the caller's namespace.

If your original data wasn't an even multiple of fifteen in length, the decoded data will have some padding with null bytes ('\0'), which can be removed.

#
# Decode the string and compare its length with the length of the original data.
#
my $decoded = base81_decode($data); 
my $padding = length($decoded) - $datalen;
chop $decoded while ($padding-- > 0);

rwsize

By default, the encode() function reads 15 bytes, and writes 19, resulting in an expansion ratio of 1.2666. It does require 128-bit integers to calculate this, which is simulated in a library. If your decoding destination doesn't have a library available, the encode function can be reduced to reading 7 bytes and writing 9, giving an expansion ratio of 1.2857. This only requires 64-bit integers, which many environments can handle.

Note that this does not affect the operation of this module, which will use 128-bit integers regardless.

To set the smaller size, use:

my($readsize, $writesize) = rwsize("I64");

To set it back:

my($readsize, $writesize) = rwsize("I128");

To simply find out the current read/write sizes:

my($readsize, $writesize) = rwsize();

Obviously, if you use the smaller sized encoding, you need to send that information along with the encoded data.

the 'pack' tag

If your data falls into a domain of 3, 9, or 27 characters, then the Base81 format can compress your data to 1/4, 1/2, or 3/4, of its original size.

b3_pack81

$three_chars = "01-";

b3_pack81($three_chars, $inputstring);

or

b3_pack81($three_chars, \@inputarray);

Transform a string (or array) consisting of three and only three characters into a Base 81 string.

$packedstr = b3_pack81("01-", "01-0-1011000---1");

or

$packedstr = b3_pack81("01-", [qw(0 1 - 0 - 1 0 1 1 0 0 0 - - - 1)]);

b9_pack81

b9_pack81("012345678", $inputstring);

or

b9_pack81("012345678", \@inputarray);

Transform a string (or array) consisting of up to nine characters into a Base 81 string.

$packedstr = b9_pack81("012345678", "6354822345507611");

or

$packedstr = b9_pack81("012345678", [qw(6 3 5 4 8 2 2 3 4 5 5 0 7 6 1 1)]);

b27_pack81

b27_pack81($twenty7_chars, $inputstring);

or

b27_pack81($twenty7_chars, \@inputarray);

Transform a string (or array) consisting of up to twenty-seven characters into a Base 81 string.

$base27str = join("", ('a' .. 'z', '_'));
$packedstr = b27_pack81($base27str, "anxlfqunxpkswqmei_qh_zkr");

or

$packedstr = b27_pack81($base27str, [qw(a n x l f q u n x p k s w q m e i _ q h _ z k r)]);

the 'unpack' tag

Naturally, data packed must needs be unpacked, and the following three functions perform that duty.

b3_unpack81

Transform a Base81 string back into a string (or array) using only three characters.

$data = b3_unpack81("012", "d$+qxW?q");

or

@array = b3_unpack81("012", "d$+qxW?q");

b9_unpack81

Transform a Base81 string back into a string (or array) using only nine characters.

$nine_chars = join "", ('0' .. '8'');

$data = b27_unpack81($nine_chars, "d$+qxW?q");

or

@array = b27_unpack81($nine_chars, "d$+qxW?q");

b27_unpack81

Transform a Base81 string back into a string (or array) using only twenty seven characters.

$twenty7_chars = join("", ('a' .. 'z', '&'));

$data = b27_unpack81($twenty7_chars, "d$+qxW?q");

or

@array = b27_unpack81($twenty7_chars, "d$+qxW?q");

SEE ALSO

The Base81 Character Set

The Base81 character set is adapted from the Base85 character set described by Robert Elz in his RFC1924 of April 1st 1996, "A Compact Representation of IPv6 Addresses" which are made up from the 94 printable ASCII characters, minus quote marks, comma, slash and backslash, and the brackets.

Despite it being an April Fool's Day RFC, the reasoning for the choice of characters for the set was solid, and Base81 uses them minus four more characters, the angle brackets, the ampersand, and the accent mark.

This reduces the character set to:

'0'..'9', 'A'..'Z', 'a'..'z', '!', '#', '$', '%', '(',
')', '*', '+', '-', ';', '=', '?', '@', '^', '_', '{',
'|', '}', and '~'.

and allows the encoded data to be used without issue in JSON or XML.

Ascii85

Base81 is a subset of Base85, which is similar in concept to Ascii85, a format developed for the btoa program, and later adopted with changes by Adobe for Postscript's ASCII85Encode filter. There are, of course, modules on CPAN that provide these formats.

Base64

Base64 encoding is an eight-bit to six-bit encoding scheme that, depending on the characters used for encoding, has been used for uuencode and MIME transfer, among many other formats. There are, of course, modules on CPAN that provide these formats.

AUTHOR

John M. Gamble <jgamble at cpan.org>

BUGS

Please report any bugs or feature requests to bug-convert-base81 at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Convert-Base81. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

This module is on Github at https://github.com/jgamble/Convert-Base81.

You can also look for information on MetaCPAN.

LICENSE AND COPYRIGHT

Copyright (c) 2019 John M. Gamble.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.