NAME
Data::Unixish - Data transformation framework, inspired by Unix toolbox philosophy
VERSION
version 1.0.0
SPECIFICATION VERSION
1.0
ABSTRACT
This document Data::Unixish (also referred to as dux, for brevity), a Perl framework for data processing (transformation, conversion, whatever) using the tried-and-true Unix toolbox philosophy.
STATUS
Early draft. The 1.0 series does not guarantee full backward compatibility between revisions, so caveat implementor. However, major incompatibility will bump the version to 1.1.
PHILOSOPHY
The Unix philosophy says a program should do only one thing and do it well. Problem is solved by sewing or chaining together a sequence of small, specialized programs. From Douglas McIlroy, the original developer of Unix pipelines:
This is the Unix philosophy: Write programs that do one thing and do it well.
Write programs to work together. Write programs to handle text streams, because
that is a universal interface.
In dux, programs translate to functions. dux is essentially a set of guidelines and tools on how to write such functions.
GUIDELINES
Function should accept a hash argument
%args
This future-proofs the function when more and more arguments are added.
Arguments should be described in Rinci metadata
See Rinci and Rinci::function for more details.
There are some standard arguments: in, out
in and out are analogous to standard input and output streams, explained below.
Arguments should have good defaults
Input data is given in
$args{in}
It is a "stream", usually actually a reference to array or a tied array. Function can iterate it as follows:
while (my ($index, $item) = each @{ $args{in} }) { ... }
Function MUST NOT slurp it in memory like this (in Perl 5 for() is not lazy):
# WRONG! for (@{ $args{in} }) { ... }
Output should be written to
$args{out}
It is a "stream", usually actually a reference to array or a tied array. Function can append output as follows:
while (my ($index, $item) = each @{ $args{in} }) { ... push @{ $args{out} }, $res; }
Function MUST NOT assign to $args{out} directly, e.g.:
# WRONG! $args{out} = [1, 2, 3];
Error messages can be logged to Log::Any
Standard format for error message will be specified in the future.
When processing, undef/invalid/non-applicable value should generally be skipped (passed unchanged)
For example, the date dux function accepts either an integer (assumed as Unix timestamp) or a DateTime object. Other values like undef, an empty string, or other kinds of supported objects should not be processed and just passed to the output stream unprocessed. A warning can be logged if needed.
A well-written dux function can be transformed into a usual Unix command-line utility (see Data::Unixish::CmdLine).
NAMESPACE ORGANIZATION
Data::Unixish is the main module and specification.
Each dux function should be written in all-lowercase name, put under Data::Unixish::FUNCTION_NAME package. The function itself is put in that package with the same name. For example the Data::Unixish::date package contains the Data::Unixish::date::date function.
A further subpackaging is allowed, for example: Data::Unixish::English::count_syllables.
Data::Unixish::CmdLine is a utility to access dux functions from command line.
SEE ALSO
Rinci and Rinci::function, another specification to leverage functions.
AUTHOR
Steven Haryanto <stevenharyanto@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Steven Haryanto.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.