NAME
Convert::Base81 - Encoding and decoding to and from Base 81 strings
SYNOPSIS
use Convert::Base81;
my $encoded = Convert::Base81::encode($data);
my $decoded = Convert::Base81::decode($encoded);
or
use Convert::Base81 qw(base81_encode base81_decode);
my $encoded = base81_encode($data);
my $decoded = base81_decode($encoded);
DESCRIPTION
This module implements a Base81 conversion for encoding binary data as text. This is done by interpreting each group of fifteen bytes as a 120-bit integer, which is then converted to a seventeen-digit base 81 representation using the alphanumeric characters 0-9, A-Z, and a-z, in addition to the punctuation characters !, #, $, %, (, ), *, +, -, ;, =, ?, @, ^, _, {, |, }, and ~, in that order, characters that are safe to use in JSON and XML formats.
This creates a string that is (1.2666) larger than the original data, making it more efficient than MIME::Base64's 3-to-4 ratio (1.3333) but slightly less so than the efficiency of Convert::Ascii85's 4-to-5 ratio (1.25).
It does have the advantage of a natural ternary system: if your data is composed of only three, or nine, or twenty-seven distinct values, its size can be compressed instead of expanded, and this module has functions that will do that.
use Convert::Base81 qw(b3_pack81 b3_unpack81);
my $input_string = q(rrgrbgggggrrgbrrbbbbrbrgggrggggg);
my $b81str = b3_pack81("rgb", $input_string);
The returned string will be one-fourth the size of the original. Equivalent functions exist for 9-digit and 27-digit values, which will return strings one-half and three-fourths the size of the original, respectively.
FUNCTIONS
base81_check
Examine a string for characters that fall outside the Base 81 character set.
Returns the first character position that fails the test, or -1 if no characters fail.
if (my $d = base81_check($base81str) != -1)
{
carp "Incorrect character at position $d; cannot decode input string";
return undef;
}
base81_encode
Convert::Base81::encode
Converts input data to Base81 test.
This function may be exported as base81_encode
into the caller's namespace.
my $datalen = length($data);
my $encoded = base81_encode($data);
Or, if you want to have managable lines, read 45 bytes at a time and write 57-character lines (remembering that encode()
takes 15 bytes at a time and encodes to 19 bytes). Remember to save the original length in case the data had to be padded out to a multiple of 15.
base81_decode
Convert::Base81::decode
Converts the Base81-encoded string back to bytes. Any spaces, linebreaks, or other whitespace are stripped from the string before decoding.
This function may be exported as base81_decode
into the caller's namespace.
If your original data wasn't an even multiple of fifteen in length, the decoded data will have some padding with null bytes ('\0'), which can be removed.
#
# Decode the string and compare its length with the length of the original data.
#
my $decoded = base81_decode($data);
my $padding = length($decoded) - $datalen;
chop $decoded while ($padding-- > 0);
rwsize
By default, the encode()
function reads 15 bytes, and writes 19, resulting in an expansion ratio of 1.2666. It does require 128-bit integers to calculate this, which is simulated in a library. If your decoding destination doesn't have a library available, the encode function can be reduced to reading 7 bytes and writing 9, giving an expansion ratio of 1.2857. This only requires 64-bit integers, which many environments can handle.
Note that this does not affect the operation of this module, which will use 128-bit integers regardless.
To set the smaller size, use:
my($readsize, $writesize) = rwsize("I64");
To set it back:
my($readsize, $writesize) = rwsize("I128");
To simply find out the current read/write sizes:
my($readsize, $writesize) = rwsize();
Obviously, if you use the smaller sized encoding, you need to send that information along with the encoded data.
the 'pack' tag
If your data falls into a domain of 3, 9, or 27 characters, then the Base81 format can compress your data to 1/4, 1/2, or 3/4, of its original size.
b3_pack81
$three_chars = "01-";
b3_pack81($three_chars, $inputstring);
or
b3_pack81($three_chars, \@inputarray);
Transform a string (or array) consisting of three and only three characters into a Base 81 string.
$packedstr = b3_pack81("01-", "01-0-1011000---1");
or
$packedstr = b3_pack81("01-", [qw(0 1 - 0 - 1 0 1 1 0 0 0 - - - 1)]);
b9_pack81
b9_pack81("012345678", $inputstring);
or
b9_pack81("012345678", \@inputarray);
Transform a string (or array) consisting of up to nine characters into a Base 81 string.
$packedstr = b9_pack81("012345678", "6354822345507611");
or
$packedstr = b9_pack81("012345678", [qw(6 3 5 4 8 2 2 3 4 5 5 0 7 6 1 1)]);
b27_pack81
b27_pack81($twenty7_chars, $inputstring);
or
b27_pack81($twenty7_chars, \@inputarray);
Transform a string (or array) consisting of up to twenty-seven characters into a Base 81 string.
$base27str = join("", ('a' .. 'z', '_'));
$packedstr = b27_pack81($base27str, "anxlfqunxpkswqmei_qh_zkr");
or
$packedstr = b27_pack81($base27str, [qw(a n x l f q u n x p k s w q m e i _ q h _ z k r)]);
the 'unpack' tag
Naturally, data packed must needs be unpacked, and the following three functions perform that duty.
b3_unpack81
Transform a Base81 string back into a string (or array) using only three characters.
$data = b3_unpack81("012", "d$+qxW?q");
or
@array = b3_unpack81("012", "d$+qxW?q");
b9_unpack81
Transform a Base81 string back into a string (or array) using only nine characters.
$nine_chars = join "", ('0' .. '8'');
$data = b27_unpack81($nine_chars, "d$+qxW?q");
or
@array = b27_unpack81($nine_chars, "d$+qxW?q");
b27_unpack81
Transform a Base81 string back into a string (or array) using only twenty seven characters.
$twenty7_chars = join("", ('a' .. 'z', '&'));
$data = b27_unpack81($twenty7_chars, "d$+qxW?q");
or
@array = b27_unpack81($twenty7_chars, "d$+qxW?q");
SEE ALSO
The Base81 Character Set
The Base81 character set is adapted from the Base85 character set described by Robert Elz in his RFC1924 of April 1st 1996, "A Compact Representation of IPv6 Addresses" which are made up from the 94 printable ASCII characters, minus quote marks, comma, slash and backslash, and the brackets.
Despite it being an April Fool's Day RFC, the reasoning for the choice of characters for the set was solid, and Base81 uses them minus four more characters, the angle brackets, the ampersand, and the accent mark.
This reduces the character set to:
'0'..'9', 'A'..'Z', 'a'..'z', '!', '#', '$', '%', '(',
')', '*', '+', '-', ';', '=', '?', '@', '^', '_', '{',
'|', '}', and '~'.
and allows the encoded data to be used without issue in JSON or XML.
Ascii85
Base81 is a subset of Base85, which is similar in concept to Ascii85, a format developed for the btoa program, and later adopted with changes by Adobe for Postscript's ASCII85Encode filter. There are, of course, modules on CPAN that provide these formats.
Base64
Base64 encoding is an eight-bit to six-bit encoding scheme that, depending on the characters used for encoding, has been used for uuencode and MIME transfer, among many other formats. There are, of course, modules on CPAN that provide these formats.
AUTHOR
John M. Gamble <jgamble at cpan.org>
BUGS
Please report any bugs or feature requests to bug-convert-base81 at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Convert-Base81. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
This module is on Github at https://github.com/jgamble/Convert-Base81.
You can also look for information on MetaCPAN.
LICENSE AND COPYRIGHT
Copyright (c) 2019 John M. Gamble.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.