NAME

File::OM - Output Multiplexer routines

SYNOPSIS

use File::OM;              # to import routines into a Perl script

$om = File::OM->new(       # make output object that creates strings in
      $format, {           # XML, Turtle, JSON, ANVL, CSV, PSV, or Plain
  outhandle => *STDOUT,    # (opt) print string instead of returning it
  verbose => 1 });         # (opt) also output record and line numbers

$om->ostream();            # open stream

$om->cstream();            # close stream

$om->orec(                 # open record
      $recnum);            # record number (normally tracked from 1)

$om->crec();               # close record

$om->elem(                 # output entire element, unless $name undefined
      $name,               # string representing element name
      $value,              # string representing element value
      $lineno,             # input line number/type (default '1:')
      $elemnum);           # element number (normally tracked from 1))

$om->elems(                # output elements; wrap ANVL/Plain/XML lines
      $name,               # string representing first element name
      $value,              # string representing first element value
      ...);                # other element names and values

$om->name_encode($s);      # encode a name
$om->value_encode($s);     # encode a value
$om->comment_encode($s);   # encode a comment or pseudo-comment

om_opt_defaults();         # get hash reference with factory defaults

DESCRIPTION

The OM (Output Multiplexer) Perl module provides a general output formatting framework for data that can be represented as a stream of records consisting of element names, values, and comments. Specific conversions are possible to XML, Turtle, JSON, CSV, PSV (Pipe Separated Value) and "Plain" unlabeled text.

The internal element structure is currently identical to the structure returned by File::ANVL::anvl_recarray. The n-th element corresponds to three Perl array elements as follows:

INDEX   CONTENT
3n + 0  input file line number
3n + 1  n-th ANVL element name
3n + 2  n-th ANVL element value

This means, for example, that the first two ANVL element names would be found at Perl array indices 4 and 7. The first triple is special; array elements 0 and 2 are undefined unless the record begins with an unlabeled value, such as (in a quasi-ANVL record),

Smith, Jo
home: 555-1234
work: 555-9876

in which case they contain the line number and value, respectively. Array element 1 always contains a string naming the format of the input, such as, "ANVL", "JSON", "XML", etc.

The remaining triples are free form except that the values will have been drawn from the original format and possibly decoded. The first item ("lineno") in each remaining triple is a number followed by a letter, such as "34:" or "6#". The number indicates the line number (or octet offset, depending on the origin format) of the start of the element. The letter is either ':' to indicate a real element or '#' to indicate a comment; if the latter, the element name has no defined meaning and the comment is contained in the value. To output an element as a comment without regard to line number, give $lineno as "#".

OM presents an object oriented interface. The object constructor takes a format argument and returns undef if the format is unknown. The returned object has methods for creating format-appropriate output corresponding (currently) to seven output modes; for a complete application of these methods, see File::ANVL::anvl_om. Nonetheless, an application can easily call no method but elem(), as the necessary open (orec() and ostream) and close (crec() and cstream()) methods will be invoked automatically before the first element is output and before the object is destroyed, respectively. Passing an undefined first argument ($name) to elem() is useful for skipping an element in a position-based format such as CSV or PSV, which indicate a missing element by outputing a separator character; when the format is not position-based, the method usually outputs nothing.

Constructor options include 'verbose', which causes the methods to insert record and line numbers as comments or pseudo-comments (e.g., for JSON, an extra element called "#" since JSON doesn't support comments). Normally output is returned as a string, but if the 'outhandle' option (defaults to '') contains a file handle, for example,

{ outhandle => *STDOUT }

the string will be printed to the file handle and the method will return the status of the print call. Constructor options and defaults:

{
outhandle        => '',        # return string instead of printing it
indent_start     => '',        # overall starting indent
indent_step      => '  ',      # how much to increment/decrement indent

# Format specific options.
turtle_indent    => '    ',    # turtle has one indent width
turtle_predns    =>            # turtle predicate namespaces
       'http://purl.org/kernel/elements/1.1/',
turtle_nosubject => 'default', # a default subject (change this)
turtle_subjelpat => '',        # pattern for matching subject element
turtle_stream_prefix => 'erc', # symbol we use for turtle
wrap             => 72,        # wrap text to 72 cols (ANVL, Plain, XML)
wrap_indent      => '',        # Text::Wrap will insert; "\t" for ANVL
xml_stream_name  => 'recs',    # for XML output, stream tag
xml_record_name  => 'rec',     # for XML output, record tag

# Used to maintain object state.
elemnum          => 0,         # current element number
elemsref         => [],        # one array to store record elements
indent           => '',        # current ident
recnum           => 0,         # current record number
}

In this release of the OM package, objects carry limited state information. Maintained are the current indention level, element number, and record number, but there is no stack of "open elements". Right now there is only a "whole element at once" method (elem()) that takes name and value arguments to construct a complete element. Future releases may support methods for opening and closing elements.

The OM package automatically tracks element and record numbers, but the optional $recnum and $elemnum method arguments can be used to set them to specific values. They help with formats that put separators before every element or record except for the first one (e.g., JSON uses commas). The $lineno argument is meant to refer to input line numbers that may be useful with the 'verbose' option and creating diagnostic messages.

SEE ALSO

A Name Value Language (ANVL) http://www.cdlib.org/inside/diglib/ark/anvlspec.pdf

HISTORY

This is a beta version of OM package. It is written in Perl.

AUTHOR

John A. Kunze jak at ucop dot edu

COPYRIGHT AND LICENSE

Copyright 2009-2011 UC Regents. Open source BSD license.

PREREQUISITES

Perl Modules: Text::Wrap

Script Categories:

UNIX : System_administration