NAME

File::OM - Output Multiplexer routines

SYNOPSIS

use File::OM;              # to import routines into a Perl script

$om = File::OM->new(       # make output object that creates strings in
      $format, {           # XML, Turtle, JSON, ANVL, or Plain formats
  outhandle => *STDOUT,    # (opt) print string instead of returning it
  verbose => 1 });         # (opt) also output record and line numbers

$om->ostream();            # open stream

$om->cstream();            # close stream

$om->orec(                 # open record
      $recnum);            # record number (normally tracked from 1)

$om->crec();               # close record

$om->elem(                 # output an entire element
      $name,               # string representing element name
      $value,              # string representing element value
      $lineno,             # input line number/type (default '1:')
      $elemnum);           # element number (normally tracked from 1))

$om->elems(                # output elements; wrap ANVL/Plain/XML lines
      $name,               # string representing first element name
      $value,              # string representing first element value
      ...);                # other element names and values

$om->name_encode($s);      # encode a name
$om->value_encode($s);     # encode a value
$om->comment_encode($s);   # encode a comment or pseudo-comment

om_opt_defaults();         # get hash reference with factory defaults

DESCRIPTION

The OM (Output Multiplexer) Perl module provides a general output formatting framework for data that can be represented as records consisting of elements, values, and comments. Specific conversions are possible to XML, Turtle, JSON, and "Plain" unlabeled text.

The internal element structure is currently identical to the structure returned by File::ANVL::anvl_recarray. The first triple of the returned array is special in that it describes the origin of the record; its elements are

INDEX   NAME        VALUE
  0     format      original format ("ANVL", "JSON", "XML", etc)
  1     <unused>
  2     <unused>

The remaining triples are free form except that the values will have been drawn from the original format and possibly decoded. The first item ("lineno") in each remaining triple is a number followed by a letter, such as "34:" or "6#". The number indicates the line number (or octet offset, depending on the origin format) of the start of the element. The letter is either ':' to indicate a real element or '#' to indicate a comment; if the latter, the element name has no defined meaning and the comment is contatined in the value. To output an element as a comment without regard to line number, give $lineno as "#".

OM presents an object oriented interface. The object constructor takes a format argument and returns undef if the format is unknown. The returned object has methods for creating format-appropriate output corresponding (currently) to five output modes; for a complete application of these methods, see File::ANVL::anvl_om. Nonetheless, an application can easily call no method but elem(), as the necessary open (orec() and ostream) and close (crec() and cstream()) methods will be invoked automatically before the first element is output and before the object is destroyed, respectively.

Constructor options include 'verbose', which causes the methods to insert record and line numbers as comments or pseudo-comments (e.g., for JSON, an extra element called "#" since JSON doesn't support comments). Normally output is returned as a string, but if the 'outhandle' option (defaults to '') contains a file handle, for example,

{ outhandle => *STDOUT }

the string will be printed to the file handle and the method will return the status of the print call. Constructor options and defaults:

{
outhandle        => '',        # return string instead of printing it
indent_start     => '',        # overall starting indent
indent_step      => '  ',      # how much to increment/decrement indent

# Format specific options.
turtle_indent    => '    ',    # turtle has one indent width
turtle_predns    =>            # turtle predicate namespaces
       'http://purl.org/kernel/elements/1.1/',
turtle_nosubject => 'default', # a default subject (change this)
turtle_subjelpat => '',        # pattern for matching subject element
turtle_stream_prefix => 'erc', # symbol we use for turtle
wrap             => 72,        # wrap text to 72 cols (ANVL, Plain, XML)
wrap_indent      => '',        # Text::Wrap will insert; "\t" for ANVL
xml_stream_name  => 'recs',    # for XML output, stream tag
xml_record_name  => 'rec',     # for XML output, record tag

# Used to maintain object state.
elemnum          => 0,         # current element number
elemsref         => [],        # one array to store record elements
indent           => '',        # current ident
recnum           => 0,         # current record number
}

In this release of the OM package, objects carry limited state information. Maintained are the current indention level, element number, and record number, but there is no stack of "open elements". Right now there is only a "whole element at once" method (elem()) that takes name and value arguments to construct a complete element. Future releases may support methods for opening and closing elements.

The OM package automatically tracks element and record numbers, but the optional $recnum and $elemnum method arguments can be used to set them to specific values. They help with formats that put separators before every element or record except for the first one (e.g., JSON uses commas). The $lineno argument is meant to refer to input line numbers that may be useful with the 'verbose' option and creating diagnostic messages.

SEE ALSO

A Name Value Language (ANVL) http://www.cdlib.org/inside/diglib/ark/anvlspec.pdf

HISTORY

This is a beta version of OM package. It is written in Perl.

AUTHOR

John A. Kunze jak at ucop dot edu

COPYRIGHT AND LICENSE

Copyright 2009-2010 UC Regents. Open source BSD license.

PREREQUISITES

Perl Modules: Text::Wrap

Script Categories:

UNIX : System_administration