NAME
Table::Readable - minimalistic human-editable tables of data
SYNOPSIS
use FindBin '$Bin';
use Table::Readable qw/read_table/;
my @list = read_table ("$Bin/file.txt");
for my $entry (@list) {
for my $k (keys %$entry) {
print "$k $entry->{$k}\n";
}
}
produces output
en Residual Current Device
ja 配線用遮断器
de Fehlerstrom-Schutzschalter
(This example is included as synopsis.pl in the distribution.)
VERSION
This documents Table::Readable version 0.01 corresponding to git commit 629e8eb593c41f860ceee0edccb5c241c5c4b62f released on Mon Feb 20 16:55:20 2017 +0900.
DESCRIPTION
Table::Readable provides a format for human-editable tables of information which a computer can read. By design, the format does not support any kind of nesting, and can only be text in UTF-8 encoding.
FUNCTIONS
read_table
my @table = read_table ("list_file.txt");
Read one table of information from the specified file. Each row of information is stored as an anonymous hash. The return value is an array. It dies if not called in array context.
Each row of the table consists of key/value pairs. The key/value pairs are given in the form
key: value
If the key has spaces
key with spaces: value
then it is turned into key_with_spaces
in the anonymous hash.
Rows are separated by a blank line.
So, for example
row: first
data: some information
row: second
data: more information
gubbins: guff here
defines two rows, the first one gets a hash reference with entries row
and data
, and the second one is a hash reference with entries row
and data
and gubbins
, each containing the information on the right of the colon.
If the key begins with two percentage symbols,
%%key:
then it marks the beginning of a multiline value which continues until the next line which begins with two percentage symbols. Thus
%%key:
this is the value
%%
assigns "this is the value" to "key".
If the key contains spaces, these are replaced by underscores. For example,
this key: value
becomes this_key
in the output. Whitespace before the colon is also converted, so
this key : value
becomes this_key_
in the output, with an underscore at the end.
Comments can be added to the table using lines with # as the first character.
The file is assumed to be in the UTF-8 encoding.
Read from a scalar
my $table = read_table ($stuff, scalar => 1);
Read from a scalar in $stuff
.
write_table
write_table (\@table, 'file.txt');
Write the table in @table
to file.txt. It insists on an array reference containing hash references, each of which has simple scalars as values.
This does not convert underscores in the keys into spaces.
TABLE FORMAT
This section gives exact details of the format of the tables.
The table takes the format
key1: value
key2: value
key1: another value
key2: yet more values
where rows of the table are separated by a blank line, and the columns of each row are defined by giving the name of the column, followed by a colon, followed by the value.
Blank lines
A blank line may contain spaces (something which matches \s
).
Multiline entries
%%key1:
value goes here.
%%
Multiline entries begin and end with two percent characters at the beginning of the line. Between the two percent characters there may be any number of blank lines. Whitespace (anything matching \s
) is stripped from the beginning and end of the value. There is no way to have double percent characters at the beginning of a line within a multiline value, so if you need double percents, you must use a different syntax and then post-process the entry to convert your syntax to double percent characters.
Comments
Lines containing a hash character '#' at the beginning of the line are ignored. However, lines containing a hash character '#' within multiline entries are considered part of the entry, not comments. Hash characters at positions other than the start of a line are not considered comments, and are not ignored.
Encoding
The file must be encoded in the UTF-8 encoding.
Whitespace
Whitespace (anything matching \s
) is stripped from the beginning and end of the value. To preserve whitespace, use your own syntax such as the following.
use Table::Readable 'read_table';
my $table =<<EOF;
a: b
%%c:
d
%%
%%e:
f
!
%%
EOF
my @entries = read_table ($table, scalar => 1);
for my $k (keys %{$entries[0]}) {
my $v = $entries[0]{$k};
$v =~ s/!$//;
print "'$k' = '$v'\n";
}
produces output
'e' = 'f
'
'c' = 'd'
'a' = 'b'
(This example is included as whitespace.pl in the distribution.)
Empty values
Keys without values, like
key:
are permitted within the table. A key with no value results in the value for that key being an empty string, rather than the undefined value.
Consistency of keys
There is no requirement that the keys in one entry of the table have to be the same as the keys in the subsequent entry. Each entry of the table may have completely inconsistent keys. If you need consistent keys, add a post-processor of your own.
Design and motivation
This module and the associated format were born out of exasperation with various complicated file formats, and the associated complicated parser software. In particular I originally made this module and format as an alternative to using the TMX format for translation memory files, and also out of frustration with the AppConfig module. I currently use this to store translations, such as http://kanji.sljfaq.org/translations.txt, and files of tabular information, such as https://www.lemoda.net/unix/troff-dictionary/dictionary.txt.
This format is deliberately designed to reduce the amount of mental effort necessary to type in a machine-readable table of information. By design, it adds only the most minimal possible interpretations to characters. There are only four significant characters, the newline, the colon, the hash character #, and the percent character %. The hash character and the percent character are only significant either when they come immediately after a new line or when they are the first byte in the file. The multiline escape sequence is two percents at the beginning of a line, a sequence which rarely occurs in normal text.
The minimalism of this module is intentional; I will never, ever, add new syntax, extra escape characters, comments not at the end of lines, nested tables, or multiple tables in one file to this format, and I would gladly remove anything from it, if there was anything that could possibly be removed. The reason for that is that every time one adds a new facility, it adds yet another meaning to some sequence of characters, which not only has to be remembered, but also has to be programmed around by adding yet another escape. Let's say that I added comments like this:
key: value # this is a comment
then I would have to add yet another escape for the case where I actually wanted to put a hash character inside a value, yet another annoying bit of syntax to remember like
key: value \# not a comment
The more one adds these kinds of meaningful characters, the more the complexity, the more the bugs, the more the workarounds, the more the fixes, and the more the number of things to remember, and the more the headaches. No thanks!
EXPORTS
Nothing is exported by default. All functions can be exported on request. A tag ":all" exports all the functions:
use Table::Readable ':all';
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2010-2017 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.