NAME

IO::Util - A selection of general-utility IO function

VERSION 1.43

The latest versions changes are reported in the Changes file in this distribution.

INSTALLATION

CPAN
perl -MCPAN -e 'install IO::Util'
Standard installation

From the directory where this file is located, type:

perl Makefile.PL
make
make test
make install

SYNOPSIS

use IO::Util qw(capture slurp Tid Lid Uid load_mml);

capture()

# captures the selected filehandle
$output_ref = capture { any_printing_code() } ;
# now $$output_ref eq 'something'

sub any_printing_code {
    print 'something'
}


# captures FILEHANDLE
$output_ref = capture { any_special_printing_code() } FILEHEANDLER ;
# now $$output_ref eq 'something'

sub any_special_printing_code {
    print 'to STDOUT';
    print FILEHANDLE 'something'
}

slurp()

$_ = '/path/to/file' ;
$content_ref = slurp ;

$content_ref = slurp '/path/to/file' ;
$content_ref = slurp \*FILEHANDLE ;

Tid(), Lid(), Uid()

$temporarily_unique_id = Tid ; # 'Q9MU1N_NVRM'
$locally_unique_id     = Lid ; # '2MS_Q9MU1N_P5F6'
$universally_unique_id = Uid ; # 'MGJFSBTK_2MS_Q9MU1N_PWES'

A MML file (Minimal Markup Language)

<opt>
 <!-- a multi line
  comment-->
    <parA>
        <optA>01</optA>
        <optA>02</optA>
        <optA>03</optA>
    </parA>
    <parB>
        <optA>04</optA>
        <optA>05</optA>
        <optA>06</optA>
        <optB>
           <key>any key</key>
        </optB>
    </parB>
</opt>

load_mml()

$struct = load_mml 'path/to/mml_file' ;
$struct = load_mml \ $mml_string ;
$struct = load_mml \ *MMLFILE ;
$struct = load_mml ..., %options ;

# $struct = {
#             'parA' => {
#                         'optA' => [
#                                     '01',
#                                     '02',
#                                     '03'
#                                   ]
#                       },
#             'parB' => {
#                         'optA' => [
#                                     '04',
#                                     '05',
#                                     '06'
#                                   ],
#                         'optB' => {
#                                     'key' => 'any key'
#                                   }
#                      }
#           }

DESCRIPTION

This is a micro-weight module that exports a few functions of general utility in IO operations.

CAPTURING OUTPUT

capture { code } [ FILEHANDLE ]

The capture function espects a code block as the first argument and an optional FILEHANDLE as the second argument. If FILEHANDLE is omitted the selected filehandle will be used by default (usually STDOUT). The function returns the reference to the captured output.

It executes the code inside the first argument block, and captures the output it sends to the selected filehandle (or to a specific filehandle). It "hijacks" all the print, printf and syswrite statements addressed to the captured filehandle, returning the scalar reference to the output. Sort of "print to scalar" function.

Note: This function ties the FILEHANDLE to IO::Util::Handle class (subclass of Tie::StdHandle) and unties it after the execution of the code. If FILEHANDLE is already tied to any other class, it just temporary re-bless the tied object to IO::Util::Handle class, re-blessing it again to its original class after the execution of the code, thus preserving the original FILEHANDLE configuration.

SLURPING FILES

slurp [ file|FILEHANDLE ]

The slurp function expects a path to a file or an open FILEHANDLE, and returns the reference to the whole file|FILEHANDLE content. If no argument is passed it will use $_ as the argument.

GENERATING UNIQUE IDs

The Tid, Lid and Uid functions return an unique ID string useful to name temporary files, or use for other purposes.

Tid ( [options] )

This function returns a temporary ID valid for the current process only. Different temporarily-unique strings are granted to be unique for the current process only ($$)

Lid ( [options] )

This function returns a local ID valid for the local host only. Different locally-unique strings are granted to be unique when generated by the same local host

Uid ( [options] )

This function returns an universal ID. Different universally-unique strings are granted to be unique also when generated by different hosts. Use this function if you have more than one machine generating the IDs for the same context. This function includes the host IP number in the id algorithm.

*id options

The above functions accept an optional hash of named arguments:

chars

You can specify the set of characters used to generate the uniquid string. You have the following options:

chars => 'base34'

uses [1..9, 'A'..'N', 'P'..'Z']. No lowercase chars, no number 0 no capital 'o'. Useful to avoid human mistakes when the id may be represented by non-electronical means (e.g. communicated by voice or read from paper). This is the default (used if you don't specify any chars option).

chars => 'base62'

Uses [0..9, 'a'..'z', 'A'..'Z']. This option tries to generate shorter ids.

chars => \@chars

Any reference to an array of arbitrary characters.

separator

The character used to separate group of characters in the id. Default '_'.

IP

Applies to Uid only. This option allows to pass the IP number used generating the universally-unique id. Use this option if you know what you are doing.

$ui = Tid                           # Q9MU1N_NVRM
$ui = Lid                           # 2MS_Q9MU1N_P5F6
$ui = Uid                           # MGJFSBTK_2MS_Q9MU1N_PWES
$ui = Uid separator=>'-'            # MGJFSBTK-2DH-Q9MU6H-7Z1Y
$ui = Tid chars=>'base62'           # 1czScD_2h0v
$ui = Lid chars=>'base62'           # rq_1czScD_2jC1
$ui = Uid chars=>'base62'           # jQaB98R_rq_1czScD_2rqA
$ui = Lid chars=>[ 0..9, 'A'..'F']  # 9F4_41AF2B34_62E76

IMPORT NOTE: If you really want to use any IO::Util::*id from its package without importing any symbol (and only in that case), you must explicitly load Time::HiRes. You must also load Sys::Hostname if you use IO::Util::Uid:

use IO::Util ()   ; # no symbol imported
use Time::HiRes   ; # used by any IO::Util::*id
use Sys::Hostname ; # used only by IO::Util::Uid

$uniqid = IO::Util::Uid

Minimal Markup Language (MML)

A lot of programmers use (de facto) a subset of canonical XML which is characterized by:

No Attributes
No mixed Data and Element content
No Processing Instructions (PI)
No Document Type Declaration (DTD)
No non-character entity-references
No CDATA marked sections
Support for only UTF-8 character encoding
No optional features

That subset has no official standard, so in this description we will generically refer to it as 'Minimal Markup Language' or MML. Please, note that MML is just an unofficial and generic way to name that minimal XML subset, avoiding any possible MXML, SML, MinML, /.+ML$/ specificity.

MML advantages

If you need just to store configuration parameters and construct any perl data structure, MLM is all what you need. Using it instead full featured XML gives you a few very interesting advantages:

  • it is really simple to use/edit and understand also by any unskilled people

  • you can parse it with very lite, fast and simple RE, thus avoiding to load and execute several thousands of perl code needed to parse full featured XML

  • anyway any canonical XML parser will be able to parse it as well

About XML parsing and structure reduction

The load_mml function produces perl structures exactly like other CPAN modules (e.g. XML::Simple, XML::Smart) but use the opposite approach. That modules usually require a canonical XML parser to achieve a full XML tree, then prune all the unwanted branches. That means thousands of line of code loaded and executed, and a potentially big structure to reduce, which probably is a waste of resources when you have just to deal with simple MML.

The load_mml uses just a few lines of recursive code, parsing MML with a simple RE. It builds up only the branches it needs, optionally ignoring all the unwanted nodes. That is exactly what you need for MML, but it is obviously completely inappropriate for full XML files (e.g. HTML) which use attributes and other features unsupported by MML.

load_mml ( MML [, options] )

This function parses the MML eventually using the options, and returns a perl structure reflecting the MML structure and any custom logic you may need (see "options"). It accepts one MML parameter that can be a reference to a SCALAR content, a path to a file or a reference to a filehandle.

This function accepts also a few options which could be passed as plain name=>value pairs or as a HASH reference.

options

You can customize the process by setting a few option, which will allow you to gain full control over the process and the resulting structure (see also the t/05_load_mml.t test file for a few examples):

strict => 1|0

Boolean. A true value will croak when any unsupported syntax is found, while a false value will quitely ignore unsupported syntax. Default true (strict).

$strict_mml = '<opt><a>01</a></opt>';
$non_strict_mml = << 'EOS';
<opt>
    mixed content ignored
    <elem attr="ignored">01</elem>
</opt>'
EOS

$structA = load_mml \$non_strict_mml ; # would croak
$structB = load_mml \$non_strict_mml, strict=>0 ;  # ok
cache => 1|0

Boolean. if MML is a path, a true value will cache the mml structure in a global (persistent under mod_perl). load_mml will open and parse the file only the first time or if the file has been modified. If for any reason you don't want to cache the structure, set this option to a false value. Default true (cached).

keep_root => 0|1

Boolean. A true value will keep the root element, while a false value will strip the root. Default false (root stripped)

$mml = '<opt><a>01</a></opt>';
$structA = load_mml \$mml ;

$$struct{a} eq '01'; # true

# $structA = {
#              'a' => '01'
#            };

$structB = load_mml \$mml, keep_root=>1 ;

$$struct{opt}{a} eq '01'; # true

# $structB = {
#              'opt' => {
#                         'a' => '01'
#                       }
#            };
filter => { id|re => CODE|'TRIM_BLANKS'|'ONE_LINE' }

This option allows to filter data from the MML to the structure. You must set it to an hash of id/filter. The key id can be the literal element id which content you want to filter, or any compiled RE you want to match against the id elements; the filter can be a CODE reference (or the name of a couple of literal built-in filters: 'TRIM_BLANKS', 'ONE_LINE').

The referenced code will receive id, data_reference and active_options_referece as the arguments; besides for regexing convenience the data is aliased in $_.

$mml = << 'EOS';
<opt>
   <foo>aaa</foo>
   <bar>bBB</bar>
   <baz>ZZz</baz>
   <multi_line>
     other
     data
   </multi_line>
   <other_stuff>something</other_stuff>
   <anything_else>not filtered</anything_else>
</opt>
EOS

$struct = load_mml \$mml, filter=>{ foo         => sub{uc},
                                    qr/^b/      => sub{lc},
                                    multi_line  => 'TRIM_BLANKS',
                                    other_stuff => \&my_filter
                                  };

sub my_filter {
    my ($id, $data_ref, $opt) = @_ ;
    # $_ contains the actual data
    # so you could use it instead of $$data_ref
    ....
    # return $_ (if modified it with any s///)
    # or any arbitrarily modified data
    return 'something else';
}
    
# $struct = {
#             'foo' => 'AAA', # it was 'aaa'
#             'bar' => 'bbb', # it was 'bBB'
#             'baz' => 'zzz', # it was 'ZZz'
#             'multi_line' => "other\ndata",  # it was "\n  other\n  data\n"
#             'other_stuff' => 'something else', # it was 'something'
#             'anything_else' => 'not filtered'  # the same
#           }
handler => { id|re => CODE }

This option allows you to execute any code during the parsing of the MML in order to change the returned structure or do any other task. It allows you to implement your own syntax, checks and executions, skip any branch, change the options of any child node, generate nodes or even objects to add to the returned structure.

You must set it to an hash of id/handler. The key id can be the literal element id which content you want to handler, or any compiled RE you want to match against the id elements; the filter must be a CODE reference.

The referenced CODE will be called instead the standard IO::Util::parse_mml handler, and will receive id, data_reference and active_options_referece as the arguments.

It is expected to return the branch to add to the returned structure. If the referenced CODE needs to refers to the original branch structure, it could retrieve it by using IO::Util::parse_mml().

A few examples using this same MML string:

$mml = << 'EOS';
<opt>
  <a>
     <b>Foo</b>
     <b>Bar</b>
  </a>
  <c>something</c>
</opt>
EOS

Regular parsing and structure:

$struct = load_mml \$mml # no options

# $struct = {
#             'a' => {
#                      'b' => [
#                               'Foo',
#                               'Bar'
#                             ]
#                    },
#             'c' => 'something'
#           } ;

Skip all the 'a' elements:

$struct = load_mml \$mml, handler=>{ a => sub{} } ; # just for 'a' elements
                  
# $struct = { 'c' => 'something' } ;

Folding an array:

$struct = load_mml \$mml, handler => { a => \&a_handler } ; # just for 'a'

  
sub a_handler {
    # get the original branch
    my $branch = IO::Util::parse_mml( @_ );
    $$branch{b} # ['Foo','Bar']
}

# $structB = {
#              'a' => [
#                       'Foo',
#                       'Bar'
#                     ],
#              'c' => 'something'
#            } ;

IO::Util::parse_mml (id, MML [, options])

Used internally and eventually by any handler, in order to parse any MML chunk and return its branch structure. It requires the element id, the reference to the MML chunk, eventually accepting the options hash reference to use for the branch.

Note: You can escape any character (specially < and >) by using the backslash '\'. XML comments can be added to the MML and will be ignored by the parser.

SUPPORT and FEEDBACK

If you need support or if you want just to send me some feedback or request, please use this link: http://perl.4pro.net/?IO::Util.

AUTHOR and COPYRIGHT

© 2004-2005 by Domizio Demichelis.

All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 687:

Non-ASCII character seen before =encoding in '©'. Assuming CP1252