NAME
ZSCII::Codec - an encoder/decoder for Z-Machine text
VERSION
version 0.002
OVERVIEW
ZSCII::Codec is a class for objects that are encoders/decoders of Z-Machine text. Right now, ZSCII::Codec only implements Version 5 (and thus 7 and 8), and even that partially. There is no abbreviation support yet.
How Z-Machine Text Works
The Z-Machine's text strings are composed of ZSCII characters. There are 1024 ZSCII codepoints, although only bottom eight bits worth are ever used. Codepoints 0x20 through 0x7E are identical with the same codepoints in ASCII or Unicode.
ZSCII codepoints are then encoded as strings of five-bit Z-characters. The most common ZSCII characters, the lowercase English alphabet, can be encoded with one Z-character. Uppercase letters, numbers, and common punctuation ZSCII characters require two Z-characters each. Any other ZSCII character can be encoded with four Z-characters.
For storage on disk or in memory, the five-bit Z-characters are packed together, three in a word, and laid out in bytestrings. The last word in a string has its top bit set to mark the ending. When a bytestring would end with out enough Z-characters to pack a full word, it is padded. (ZSCII::Codec pads with Z-character 0x05, a shift character.)
Later versions of the Z-Machine allow the mapping of ZSCII codepoints to Unicode codepoints to be customized. ZSCII::Codec does not yet support this feature.
ZSCII::Codec does allow conversion between all four relevant representations: Unicode text, ZSCII text, Z-character strings, and packed Z-character bytestrings. All four forms are represented by Perl strings.
METHODS
new
my $z = ZSCII::Codec->new;
my $z = ZSCII::Codec->new(\%arg);
my $z = ZSCII::Codec->new($version);
This returns a new codec. If the only argument is a number, it is treated as a version specification. If no arguments are given, a Version 5 codec is made.
Valid named arguments are:
- version
-
The number of the Z-Machine targeted; at present, only 5, 7, or 8 are permitted values.
- extra_characters
-
This is a reference to an array of between 0 and 97 Unicode characters. These will be the characters to which ZSCII characters 155 through 251. They may not duplicate any characters represented by the default ZSCII set. No Unicode codepoint above U+FFFF is permitted, as it would not be representable in the Z-Machine Unicode substitution table.
If no extra characters are given, the default table is used.
- alphabet
-
This is a string of 78 characters, representing the three 26-character alphabets used to encode ZSCII compactly into Z-characters. The first 26 characters are alphabet 0, for the most common characters. The rest of the characters are alphabets 1 and 2.
No character with a ZSCII value greater than 0xFF may be included in the alphabet. Character 52 (A2's first character) should be NUL.
If no alphabet is given, the default alphabet is used.
- alphabet_is_unicode
-
By default, the values in the
alphabet
are assumed to be ZSCII characters, so that the contents of the alphabet table from the Z-Machine's memory can be used directly. Thealphabet_is_unicode
option specifies that the characters in the alphabet string are Unicode characters. They will be converted to ZSCII internally by theunicode_to_zscii
method, and if characters appear in the alphabet that are not in the default ZSCII set or the extra characters, an exception will be raised.
encode
my $packed_zchars = $z->encode( $unicode_text );
This method takes a string of text and encodes it to a bytestring of packed Z-characters.
Internally, it converts the Unicode text to ZSCII, then to Z-characters, and then packs them. Before this processing, any native newline characters (the value of \n
) are converted to U+000D
to match the Z-Machine's use of character 0x00D for newline.
decode
my $text = $z->decode( $packed_zchars );
This method takes a bytestring of packed Z-characters and returns a string of text.
Internally, it unpacks the Z-characters, converts them to ZSCII, and then converts those to Unicode. Any ZSCII characters 0x00D are converted to the value of \n
.
unicode_to_zscii
my $zscii_string = $z->unicode_to_zscii( $unicode_string );
This method converts a Unicode string to a ZSCII string, using the dialect of ZSCII for the ZSCII::Codec's configuration.
If the Unicode input contains any characters that cannot be mapped to ZSCII, an exception is raised.
zscii_to_unicode
my $unicode_string = $z->zscii_to_unicode( $zscii_string );
This method converts a ZSCII string to a Unicode string, using the dialect of ZSCII for the ZSCII::Codec's configuration.
If the ZSCII input contains any characters that cannot be mapped to Unicode, an exception is raised. In the future, it may be possible to request a Unicode replacement character instead.
zscii_to_zchars
my $zchars = $z->zscii_to_zchars( $zscii_string );
Given a string of ZSCII characters, this method will return a (unpacked) string of Z-characters.
It will raise an exception on ZSCII codepoints that cannot be represented as Z-characters, which should not be possible with legal ZSCII.
zchars_to_zscii
my $zscii = $z->zchars_to_zscii( $zchars_string, \%arg );
Given a string of (unpacked) Z-characters, this method will return a string of ZSCII characters.
It will raise an exception when the right thing to do can't be determined. Right now, that could mean lots of things.
Valid arguments are:
- allow_early_termination
-
If
allow_early_termination
is true, no exception is thrown if the Z-character string ends in the middle of a four z-character sequence. This is useful when dealing with dictionary words.
make_dict_length
my $zchars = $z->make_dict_length( $zchars_string )
This method returns the Z-character string fit to dictionary length for the Z-machine version being handled. It will trim excess characters or pad with Z-character 5 to be the right length.
When converting such strings back to ZSCII, you should pass the allow_early_termination
to zchars_to_zscii
, as a four-Z-character sequence may have been terminated early.
pack_zchars
my $packed_zchars = $z->pack_zchars( $zchars_string );
This method takes a string of unpacked Z-characters and packs them into a bytestring with three Z-characters per word. The final word will have its top bit set.
unpack_zchars
my $zchars_string = $z->pack_zchars( $packed_zchars );
Given a bytestring of packed Z-characters, this method will unpack them into a string of unpacked Z-characters that aren't packed anymore because they're unpacked instead of packed.
Exceptions are raised if the input bytestring isn't made of an even number of octets, or if the string continues past the first word with its top bit set.
AUTHOR
Ricardo SIGNES <rjbs@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Ricardo SIGNES.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.