IO::Util - A selection of general-utility IO function
The latest versions changes are reported in the Changes file in this distribution.
perl -MCPAN -e 'install IO::Util'
- Standard installation
From the directory where this file is located, type:
perl Makefile.PL make make test make install
use IO::Util qw(capture slurp Tid Lid Uid load_mml);
# captures the selected file handler
$output_ref = capture { any_printing_code() } ;
# now $$output_ref eq 'something'
sub any_printing_code {
print 'something'
# captures FILEHANDLER
$output_ref = capture { any_special_printing_code() } FILEHEANDLER ;
# now $$output_ref eq 'something'
sub any_special_printing_code {
print 'to STDOUT';
print FILEHANDLER 'something'
$_ = '/path/to/file' ;
$content_ref = slurp ;
$content_ref = slurp '/path/to/file' ;
$content_ref = slurp \*FILEHANDLER ;
Tid(), Lid(), Uid()
$temporarily_unique_id = Tid() ; # 'Q9MU1N_NVRM'
$locally_unique_id = Lid() ; # '2MS_Q9MU1N_P5F6'
$universally_unique_id = Uid() ; # 'MGJFSBTK_2MS_Q9MU1N_PWES'
A MML file (Minimal Markup Language)
<!-- a multi line
<key>any key</key>
$struct = load_mml('path/to/mml_file') ;
$struct = load_mml(\ $mml_string) ;
$struct = load_mml(\ *MMLFILE) ;
$struct = load_mml(..., \%options) ;
# $struct dump
# $struct = {
# 'parA' => {
# 'optA' => [
# '01',
# '02',
# '03'
# ]
# },
# 'parB' => {
# 'optA' => [
# '04',
# '05',
# '06'
# ],
# 'optB' => {
# 'key' => 'any key'
# }
# }
# }
This is a micro-weight module that exports a few functions of general utility in IO operations.
capture { code } [ FILEHANDLER ]
The capture
function espects a code block as the first argument and an optional FILEHANDLER as the second argument. If FILEHANDLER is omitted the selected file handler will be used by default (usually STDOUT
). The function returns the reference to the captured output.
It executes the code inside the first argument block, and captures the output it sends to the selected file handler (or to a specific file handler). It "hijacks" all the print
and printf
statements addressed to the captured filehandler, returning the scalar reference to the output. Sort of "print to scalar" function.
Note: This function ties the FILEHANDLER to IO::Util class and unties it after the execution of the code. If FILEHANDLER is already tied to any other class, it just temporary re-bless the tied object to IO::Util class, re-blessing it again to its original class after the execution of the code, thus preserving the original FILEHANDLER configuration.
slurp [ file|FILEHANDLER ]
The slurp
function expects a path to a file or an open FILEHANDLER, and returns the reference to the whole file|FILEHANDLER content. If no argument is passed it will use $_ as the argument.
The Tid
, Lid
and Uid
functions return an unique ID string useful to name temporary files, or use for other purposes.
Tid ( [options] )
This function returns a temporary ID valid for the current process only. Different temporarily-unique strings are granted to be unique for the current process only ($$)
Lid ( [options] )
This function returns a local ID valid for the local host only. Different locally-unique strings are granted to be unique when generated by the same local host
Uid ( [options] )
This function returns an universal ID. Different universally-unique strings are granted to be unique also when generated by different hosts. Use this function if you have more than one machine generating the IDs for the same context. This function includes the host IP number in the id algorithm.
*id options
The above functions accept an optional hash of named arguments:
- chars
You can specify the set of characters used to generate the uniquid string. You have the following options:
- chars => 'base34'
uses [1..9, 'A'..'N', 'P'..'Z']. No lowercase chars, no number 0 no capital 'o'. Useful to avoid human mistakes when the id may be represented by non-electronical means (e.g. communicated by voice or read from paper). This is the default (used if you don't specify any chars option).
- chars => 'base62'
[0..9, 'a'..'z', 'A'..'Z']
. This option tries to generate shorter ids. - chars => \@chars
Any reference to an array of arbitrary characters.
- separator
The character used to separate group of characters in the id. Default '_'.
- IP
Applies to
only. This option allows to pass the IP number used generating the universally-unique id. Use this option if you know what you are doing.
$ui = Tid() # Q9MU1N_NVRM
$ui = Lid() # 2MS_Q9MU1N_P5F6
$ui = Uid() # MGJFSBTK_2MS_Q9MU1N_PWES
$ui = Uid(separator=>'-') # MGJFSBTK-2DH-Q9MU6H-7Z1Y
$ui = Tid(chars=>'base62') # 1czScD_2h0v
$ui = Lid(chars=>'base62') # rq_1czScD_2jC1
$ui = Uid(chars=>'base62') # jQaB98R_rq_1czScD_2rqA
$ui = Lid(chars=>[ 0..9, 'A'..'F']) # 9F4_41AF2B34_62E76
IMPORT NOTE: If you really want to use any IO::Util::*id
from its package without importing any symbol (and only in that case), you must explicitly load Time::HiRes
. You must also load Sys::Hostname
if you use IO::Util::Uid
use IO::Util () ; # no symbol imported
use Time::HiRes ; # used by any IO::Util::*id
use Sys::Hostname ; # used only by IO::Util::Uid
$uniqid = IO::Util::Uid()
Minimal Markup Language (MML)
A lot of programmers use (de facto) a subset of canonical XML which is characterized by:
No Attributes
No mixed Data and Element content
No Processing Instructions (PI)
No Document Type Declaration (DTD)
No non-character entity-references
No CDATA marked sections
Support for only UTF-8 character encoding
No optional features
That subset has no official standard, so in this description we will generically refer to it as 'Minimal Markup Language' or MML. Please, note that MML is just an unofficial and generic way to name that minimal XML subset, avoiding any possible MXML, SML, MinML, /.+ML$/ specificity.
MML advantages
If you need just to store configuration parameters and construct any perl data structure, MLM is all what you need. Using it instead full featured XML gives you a few very interesting advantages:
it is really simple to use/edit and understand also by any unskilled people
you can parse it with very lite, fast and simple RE, thus avoiding to load and execute several thousands of perl code needed to parse full featured XML
anyway any canonical XML parser will be able to parse it as well
About XML parsing and structure reduction
The load_mml
function produces perl structures exactly like other CPAN modules (e.g. XML::Simple, XML::Smart) but use the opposite approach. That modules usually require a canonical XML parser to achieve a full XML tree, then prune all the unwanted branches. That means thousands of line of code loaded and executed, and a potentially big structure to reduce, which probably is a waste of resources when you have just to deal with simple MML.
The load_mml
uses just a few lines of recursive code, parsing MML with a simple RE. It builds up only the branches it needs, optionally ignoring all the unwanted nodes. That is exactly what you need for MML, but it is obviously completely inappropriate for full XML files (e.g. HTML) which use attributes and other features unsupported by MML.
load_mml ( MML [, options] )
This function parses the MML eventually using the options, and returns a perl structure reflecting the MML structure and any custom logic you may need (see "options"). It accepts one MML parameter that can be a reference to a SCALAR content, a path to a file or a reference to a filehandle. It accepts also one options parameter, which must be an hash reference.
You can customize the process by setting a few option, which will allow you to gain full control over the process and the resulting structure (see also the t/05_load_mml.t test file for a few examples):
- strict => 1|0
Boolean. A true value will croak when any unsupported syntax is found, while a false value will quitely ignore unsupported syntax. Default true (strict).
$strict_mml = '<opt><a>01</a></opt>'; $non_strict_mml = << 'EOS'; <opt> mixed content ignored <elem attr="ignored">01</elem> </opt>' EOS $structA = load_mml( \$non_strict_mml ); # would croak $structB = load_mml( \$non_strict_mml, {strict => 0} ); # ok
- keep_root => 0|1
Boolean. A true value will keep the root element, while a false value will strip the root. Default false (root stripped)
$mml = '<opt><a>01</a></opt>'; $structA = load_mml( \$mml ); $$struct{a} eq '01'; # true # $structA = { # 'a' => '01' # }; $structB = load_mml( \$mml, {keep_root => 1} ); $$struct{opt}{a} eq '01'; # true # $structB = { # 'opt' => { # 'a' => '01' # } # };
- filter => { id|re => CODE|'TRIM_BLANKS'|'ONE_LINE' }
This option allows to filter data from the MML to the structure. You must set it to an hash of id/filter. The key id can be the literal element id which content you want to filter, or any compiled RE you want to match against the id elements; the filter can be a CODE reference (or the name of a couple of literal built-in filters: 'TRIM_BLANKS', 'ONE_LINE').
The referenced code will receive id, data_reference and active_options_referece as the arguments; besides for regexing convenience the data is aliased in
.$mml = << 'EOS'; <opt> <foo>aaa</foo> <bar>bBB</bar> <baz>ZZz</baz> <multi_line> other data </multi_line> <other_stuff>something</other_stuff> <anything_else>not filtered</anything_else> </opt> EOS $struct = load_mml( \$mml, { filter => { foo => sub{uc}, qr/^b/ => sub{lc}, multi_line => 'TRIM_BLANKS', other_stuff => \&my_filter } } ); sub my_filter { my ($id, $data_ref, $opt) = @_ ; # $_ contains the actual data # so you could use it instead of $$data_ref .... # return $_ (if modified it with any s///) # or any arbitrarily modified data return 'something else'; } # $struct = { # 'foo' => 'AAA', # it was 'aaa' # 'bar' => 'bbb', # it was 'bBB' # 'baz' => 'zzz', # it was 'ZZz' # 'multi_line' => "other\ndata", # it was "\n other\n data\n" # 'other_stuff' => 'something else', # it was 'something' # 'anything_else' => 'not filtered' # the same # }
- handler => { id|re => CODE }
This option allows you to execute any code during the parsing of the MML in order to change the returned structure or do any other task. It allows you to implement your own syntax, checks and executions, skip any branch, change the options of any child node, generate nodes or even objects to add to the returned structure.
You must set it to an hash of id/handler. The key id can be the literal element id which content you want to handle, or any compiled RE you want to match against the id elements; the filter must be a CODE reference.
The referenced CODE will be called instead the standard
handler, and will receive id, data_reference and active_options_referece as the arguments.It is expected to return the branch to add to the returned structure. If the referenced CODE needs to refers to the original branch structure, it could retrieve it by using IO::Util::parse_mml().
A few examples using this same MML string:
$mml = << 'EOS'; <opt> <a> <b>Foo</b> <b>Bar</b> </a> <c>something</c> </opt> EOS
Regular parsing and structure:
$struct = load_mml( \$mml ) # no options # $struct = { # 'a' => { # 'b' => [ # 'Foo', # 'Bar' # ] # }, # 'c' => 'something' # } ;
Skip all the 'a' elements:
$struct = load_mml( \$mml , { handler => { a => sub{} } # just for 'a' elements } ) ; # $struct = { # 'c' => 'something' # } ;
Folding an array:
$struct = load_mml( \$mml , { handler => { a => \&a_handler } # just for 'a' } ) ; sub a_handler { # get the original branch my $branch = IO::Util::parse_mml( @_ ); $$branch{b} # ['Foo','Bar'] } # $structB = { # 'a' => [ # 'Foo', # 'Bar' # ], # 'c' => 'something' # } ;
IO::Util::parse_mml (id, MML [, options])
Used internally and eventually by any handler, in order to parse any MML chunk and return its branch structure. It requires the element id, the reference to the MML chunk, and accepts eventually the options hash reference to use for the branch.
Note: You can escape any character (specially < and >) by using the backslash '\'. XML comments can be added to the MML and will be ignored by the parser.
If you need support or if you want just to send me some feedback or request, please use this link:
© 2004 by Domizio Demichelis.
All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 642:
Non-ASCII character seen before =encoding in '©'. Assuming CP1252