Unixish - Data transformation framework, inspired by Unix toolbox philosophy


version 1.0.1




This document specifies Unixish, Perl framework for data processing (transformation, conversion, whatever) using the tried-and-true Unix toolbox philosophy. For the implementation, see Data::Unixish.


Early draft. The 1.0 series does not guarantee full backward compatibility between revisions, so caveat implementor. However, major incompatibility will bump the version to 1.1.


The Unix philosophy says a program should do only one thing and do it well. Problem is solved by sewing or chaining together a sequence of small, specialized programs. From Douglas McIlroy, the original developer of Unix pipelines:

This is the Unix philosophy: Write programs that do one thing and do it well.
Write programs to work together. Write programs to handle text streams, because
that is a universal interface.

In Unixish, programs translate to functions. Unixish is essentially a set of guidelines and tools on how to write such functions.


Data::Unixish is the Perl implementation.

dux is a short notation for Data::Unixish.


  • Function should accept a hash argument %args

    This future-proofs the function when more and more arguments are added.

  • Arguments should be described in Rinci metadata

    See Rinci and Rinci::function for more details.

  • There are some standard arguments: in, out

    in and out are analogous to standard input and output streams, explained below.

  • Arguments should have good defaults

  • Input data is given in $args{in}

    It is a "stream", usually actually a reference to array or a tied array. Function can iterate it as follows:

    while (my ($index, $item) = each @{ $args{in} }) {

    Function MUST NOT slurp it in memory like this (in Perl 5 for() is not lazy):

    # WRONG!
    for (@{ $args{in} }) {
  • Output should be written to $args{out}

    It is a "stream", usually actually a reference to array or a tied array. Function can append output as follows:

    while (my ($index, $item) = each @{ $args{in} }) {
        push @{ $args{out} }, $res;

    Function MUST NOT assign to $args{out} directly, e.g.:

    # WRONG!
    $args{out} = [1, 2, 3];
  • Error messages can be logged to Log::Any

    Standard format for error message will be specified in the future.

  • When processing, undef/invalid/non-applicable value should generally be skipped (passed unchanged)

    For example, the date dux function accepts either an integer (assumed as Unix timestamp) or a DateTime object. Other values like undef, an empty string, or other kinds of supported objects should not be processed and just passed to the output stream unprocessed. A warning can be logged if needed.

A well-written dux function can be transformed into a usual Unix command-line utility.


Unixish is the specification.

Data::Unixish is the implementation.

Each dux function should be written in all-lowercase name, put under Data::Unixish::FUNCTION_NAME package. The function itself is put in that package with the same name. For example the Data::Unixish::date package contains the Data::Unixish::date::date function.

A further subpackaging is allowed, for example: Data::Unixish::English::count_syllables.

Data::Unixish::CmdLine is a utility to access dux functions from command line.


Rinci and Rinci::function, another specification to leverage functions.



Steven Haryanto <>


This software is copyright (c) 2012 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.