NAME
IO::ReadPreProcess - Macro processing built into IO::File replacement
SYNOPSIS
use IO::ReadPreProcess;
my $fh = new IO::ReadPreProcess(File => './input.file') or
die "Startup error: $IO::ReadPreProcess::errstr\n";
while(<$fh>) {
print $_; # Or other processing of input
}
die($IO::ReadPreProcess::errstr . "\n")
if($fn->error);
The input file may contain:
This line will be returned by getline
.# This is a comment
.let this := 'that'
Another line
.if this eq 'that'
Another line returned
.print The variable this has the value \v{this}
.else
This line will not be seen
.fi
This line returned
.include another.file
Line returned after the contents of another.file
DESCRIPTION
Provide an 'intelligent' bottom end read function for scripts, what is read is pre-processed before the script sees it. Your program does not need code to conditionally discard some input, include files and substitute values.
An easy way of reading input where some lines are read conditionally and other files included: .if/.else/.elseif/.fi, do: .include .let .print, loops: .while .for; subroutine definition & call; write to other streams - and more.
Provides IO::Handle-ish functions and input diamond - thus easy to slot in to existing scripts.
The preprocessing layer has variables that can be set and read by your perl code. In the input files they are set via .let
directives, and can be made part of your script's input with .echo
and \v{xxx}
.
IO::ReadPreProcess
returns lines from the input stream. This may have directives that include:
set variables to arithmetic or string expressions
conditionally return lines
include other files
print to stdout or stderr
Conditions are done by Math::Expression
.
CONSTRUCTOR
new
returns an IO::ReadPreProcess
object, undef
on error.
Arguments to new
File
andFd
-
Arguments
File
andFd
, see methodopen
. If one of these is not given, methodopen
must be called. Trim
-
If this is true (default) then input lines will be trimmed of spaces.
Math
-
A
Math::Expression
object that will be used for expression evaluation. If this is not given a new object will be instantiated withPermitLoops => 1, EnablePrintf => 1
.If you share a
Math::Expression
object between differentIO::ReadPreProcess
objects then the different files being read will see the same variables. DirStart
andDirStartRE
-
DirStart
is the string at the start of a line that introduces a directive, the default is full stop.
. If you wish to change this, provide this option. So to use directives like#if
go:new IO::ReadPreProcess(File => 'fred', DirStart> => '#')
Before use the characters that are special in Regular Expressions will have a backslash
\
prepended, this string is stored inDirStartRE
. If the optionDirStartRE
is provided this transformation will not be done and the provided string will be used directly, thus more complex start sequences can be used.Eg: allow the start sequence to be either
.
or%
:new IO::ReadPreProcess(File => 'fred', DirStartRE> => '[.%]')
Raw
-
If this is given and true then processing of directives does not happen, they are returned by
getline
. You may change this property as input is read but take care to avoid errors, eg: a.if
is read in Raw mode but its.fi
in Cooked mode; a complaint will result as the.fi
did not have an.if
.Raw
might set when in an.include
. When the end of that file is reached the previous file (that had the.include
directive) will be returned to and lines read from there.Default: 0
OnError
-
What should happen when an error happens. Values:
warn
-
Print a message to
STDERR
withwarn
, this is the default. die
-
Print a message to
STDERR
withdie
which terminates the program. -
Do nothing. The application should check the method
error
and look at$IO::ReadPreProcess::errstr
.
- PipeOK
-
Pipes are only allowed with
.include
if the propertyPipeOK
is true (default false). - MaxLoopCount
-
Loops (
while
,until
andfor
) will abort after this number of iterations. The count restarts if the loop is restarted. A value of0
disables this test.This may be overridden on an individual loop with the
-i
option.Default 50.
- OutStreams
-
This defines output streams that may be written to by
.out
and.print -o
. The streams can either beIO::File
or a reference to a function (when the line will be passed as the only argument).The members
STDOUT
andSTDERR
are added if not passed, given values*STDOUT{IO}
and*STDERR{IO}
. Names must match the RE/w+/
.Eg:
my $lf = IO::File->new('logFile', 'w+'); sub func { say "func called '$_[0]'"; } OutStreams => { fun => \&func, log => $lf }
This provides the ability to write to multiple places, however the file (or function) must be opened by the Perl script.
IO::ReadPreProcess
does not provide the ability to open new files.
PUBLIC PROPERTIES
The properties Trim
, OnError
, MaxLoopCount
, OutStreams
, PipeOK
and Raw
(see new
) may be directly assigned to at any time.
Eg:
$fh->Raw = 1;
Also the following:
Math
-
Note that there are many useful values that you can get here, some set by
IO::ReadPreProcess
(see below), others by.let
directives. You can thus communicate with the preprocessing layer.Eg:
You can set
Math
variables like this:$fh->{Math}->VarSetScalar('FirstName', 'Henry');
You can get
Math
variable values like this:$name = $fh->{Math}->ParseToScalar('FirstName'); $fileName = $fh->{Math}->ParseToScalar('_FileName');
Place
-
A string that can be used in messages to the user about the current input place. The value will be like:
line 201 of slides/regular-expressions.mod
Eg:
warn "Something wrong at $fh->{Place}\n";
METHODS
new
-
This has been discussed above.
open
-
The argument is the name of the file to be opened and read from. This method need not be used if the information is given to
new
.open
returns anIO::ReadPreProcess
object,undef
on error.File
-
gives the name of the file to be opened. This is mandatory.
Fd
-
If this is given it provides a file descriptor (from
IO::File
) that is already open for reading. In which caseFile
(which must still be given) is a name that is used in error messages. This is useful if you want to read fromstdin
or a pipe.
If there is an error in opening a file look at
$IO::ReadPreProcess::errstr
;Example:
$fh->open(Fd => \*STDIN, File => 'Standard input', OutStreams => { log => $lf });
close
-
Closes the current input file. If the current file was opened by a
.include
, the next line that is read will be the one after the.include
directive.This will not normally be used by applications.
close
returns anIO::ReadPreProcess
object,undef
on error.** Also used to end a block
getline
-
will return a line from input. This line is not necessarily the next one in the input file since directives (see below) may specify that some lines are not returned or that input is taken from another file.
As an alternative, the object (what is returned by
new
) may be used in the diamond operator which really callsgetline
.After all input has been read this returns
undef
.while(my $line = $fh->getline) { ... }
Returns
undef
on error. getlines
-
Returns the rest of input as an array.
This must be called in a list context.
Returns
undef
on error. putline
-
The argument list will be put as input on the current frame and these will be 'read' as the very next input. Useful for running a .sub. Eg:
$fh->putline('.show Frodo 35');
binmode
-
This package is intended to read text files, thus setting binary data is probably not a good idea.
binmode
also allows different (layer) encoding to be supported, eg:$fh->binmode(':utf8');
Any
binmode
settings will be applied to all files subsequently opened, eg: because of.include
.Returns true on success,
undef
on error.See perl's
binmode
function. eof
-
Returns 1 if the next read will return End Of File or the file is not open.
error
-
Returns true of there has been an error. See
clearerr
. clearerr
-
Clears any error indicator.
DIRECTIVES
Input files may contain directives. These all start with a full stop (.
) at the start of line, this may be changed with DirStart
. There may not be spaces before the .
.
Lines starting with directives other than the ones below will be returned to the application.
Conditions are done by Math::Expression
.
.#
-
These lines are treated as comment and are removed from input.
.let
-
The argument is an expression as understood by
Math::Expression
. Then result is ignored. This may be used to set one or more variables.Eg:
.let count := 0; page := 1 .let ++count .let if(count > 10) { ++page; count := 0 }
.if
.elseif
.elseif
.else
.fi
.unless
-
The rest of the
.if
line is evaluated byMath::Expression
and if the result is true the following lines will be returned. An optional.else
reverses the sense of the.if
as regards (not) returning lines..if
may be nested. A.if
must have a matching and ending.fi
..elseif
may be used where a.else
can be found and must be followed by a condition..elsif
is a synonym for.elseif
..unless
is the same as.if
except that the truthness of the result is considered inverted.Text following
.fi
or.else
will be ignored - you may use as comment.The condition may be a defined subroutine which will be run and the value set by
.return
used as the boolean. The arguments are processed as if by.print
..if .someSub arg1 \v{someVariable} Conditional text .fi
The condition may also be one of the directives:
.include
.read
.test
.print
-
The rest of the line will be printed to
stdout
.If the line starts
-e
it will be written to output streamSTDERR
.If the line starts
-o strm
it will be written to output streamstrm
.Eg:
.print -o log Something interesting has happened!
The following escapes will be recognised and substitutions performed:
.echo
-
Escape substitution is performed as with
.print
and the line returned bygetline
. This allows variables to be used in the input the application reads without it being aware of what is going on..echo Index=\v{i} person=\v{names[i]}
.include
-
The first argument is a file path that is opened and lines from this returned to the application.
Paths that start
/
are absolute and are just accepted.Path that start
#
are taken to be with respect to the current working directory of the process. The#
is removed and the path accepted..# Include a file 'header' from a generic 'snippets' directory: .include #snippets/header
Other paths are relative to the file being processed, the directory path is prepended and the result used. If such a path is used in a file opened by
Fd
an error results..# Include a file in the same directory as the current file: .include common_module
If the path starts
$
the next word is a variable name. The value is prepended to the rest of the path and the file tested for existence as above (eg test starts/
,#
and others). If the variable is an array the paths are tried until one is found. Eg:.let dirs := split(':', '.:mydir:#builddir:/home/you/yourdir:/usr/local/ourdir') .include $dirs/good_file.txt
Words that follow are deemed arguments and made available within the include via the array
_ARGS
. See.sub
..# header can generate different headers, ask for one suitable for a report: .include #snippets/header report
The file path and arguments are processed for escapes as
.print
.The file path and arguments may contain spaces if they are surrounded by quotes (
'"
).If the path starts
|
the rest of the line is a pipe that will be run and read from. Pipes are only allowed if the propertyPipeOK
is true (default false). WARNING this will run an arbitrary command, you must be confident of the source and contents of the files being processed.If the first arguments are
-s name
the file is opened on a named stream that may be used by.read
and should be closed with.close
.If the first argument is
-pn
the file stream is putn
frames below the current one. A new frame is created for every file opened,if
,while
,sub
executed, ... (n
is an optional number, default: 1) .close
-
This is only needed to close named streams. The
-s name
option is needed. .out
-
This diverts output to the output stream (see:
OutStreams
) mentioned. Lines generated will be sent there until a.out
directive without an argument.Eg:
.out index Meals in London Times of the last tube trains .out
.local
-
Marks the arguments as variable names that are local to the current block (
.include
,.while
,.sub
, ...). When the block returns the previous value will be restored. The values restored are the values of the variables at the time the.local
is seen. Note that variable scope is dynamic, not lexical.This happens automatically for
_ARGS
on an.include
and \c{.sub} and named \c{.sub} arguments. .return
-
This ends reading a file early, the previous file is picked up on the line after the
.include
. At the top level (ie first file) end of file is returned to the application.Within a
.sub
this may be used to return a value. The value of the last expression in a.sub
is not automatically used as a return value..return
may be followed by an expression; this will be assigned to the variable_
(underscore). Defaultundef
:.return count + 1
.exit
-
The application will be terminated.
Any text after on the line will be processed by
Math::Expression
and if it is a number it is used as an exit code. If none is specified the exit code will be 2. .eval
-
The rest of the line is processed for escapes as
.print
. It is then treated as if it had just been read. The processed line might even start with a command that is recognised by this module, eg this ends up setting variablea
to the value 3:.let a := 1; b := 2; var := 'a' .print a=\v{a} b=\v{b} .eval .let \v{var} := 3 .print a=\v{a} b=\v{b}
Do not use
.eval
to generate a conditional or loop, eg:.if
;.while
. .read
-
Read the next line of input into a variable. The line is trimmed of the trailing newline (chomped). It is trimmed of white space if
Trim
. The variable_EOF
is assigned 0.At end of file the variable is assigned the empty string and the variable
_EOF
is assigned 1. The variable_
is set to 1 on read success.This will be of most use with a stream opened with a
-p
or-s
option:.include -p | hostname .read host .echo This machine is called \v{host} .include -s who | whoami .read -s who me .echo Logged in as \v{me} .close -s who
.sub
-
Defines a subroutine on the following lines, ending with
.done
. The subroutine is called by invoking it.name
.Arguments may be passed to the subroutine and are available in the array
_ARGS
. Following thename
optional names may be given, these are variables as.local
and, when called, any arguments are copied there. Beware: these are copies, ie separate from what is in_ARGS
..sub show name age Hobbits live in the Shire .echo \v{name} is \v{age} years old .echo That name again: \v{_ARGS[0]} .done .show 'Bilbo Baggins' 50
You can get the original argument string with
join
, beware this will not give the exact argument string since if two words are separated by more than one space the extra spaces will be lost..sub manyArgs .let allArg := join(' ', _ARGS); na := count(_ARGS) .echo All \v{na} arguments as a string '\v{allArg}' .done .manyArgs all cats have whiskers
.noop
-
This is a no-operation and does nothing.
.while
-
This starts a loop that continues as long as the expression (see
.if
) is true. The loop is terminated by the line.done
.If the option
-inn
is given, the loop limit is set tonn
for this loop. See defaultMaxLoopCount
. There may be spaces between-i
andnn
..let i := 0 .while -i 100 i++ < 100 Part of a .while loop .echo i has the value \v{i} .done
Loops are buffered in memory.
.include
within a loop is not buffered, ie read on every iteration. .until
-
This is the same as
.while
except that the loop stops when the expression becomes true. .for
-
This starts a loop. The loop is terminated by the line
.done
.This has the form:
.for init ;; condition ;; incr
Note that the
;;
will be seen even if inside a quoted string.As with
.while
and.until
you may use the-i
option.init
is run once before the loop starts;condition
is as.while
;incr
is run after every iteration.init
andincr
are processed byMath::Expression
, ie no subs allowed.Eg:
Count down begins .for i := 10 ;; i > 0 ;; i-- .echo \v{i} .done Blast off! .sub foo num .return num > 2 .done .for i := 5 ;; .foo \v{i} ;; i-- something ... .done
.break
.last
-
Terminate the current loop. These directives are synonyms.
These may be followed by the number of loops to terminate, default 1.
.continue
.next
-
Abandon the rest of the current loop, start the next iteration. These directives are synonyms.
These may be followed by a number, inner ones are terminated, that loop number has its iteration started, default 1.
.done
-
Ends blocks:
.while
.until
.for
.sub
. If may be followed by the type of block that it ends, if so a consistency check is made.Eg:
.for i := 0 ;; i < 5 ;; i++ Text output .done for
.test
-
Various tests. This will set
_
to0
or1
.-f
-
This returns true if the argument file exists. The file path is as for
.include
except that pipes are not allowed. This also sets the array_STAT
with information about the file (see below) and_TestFile
will be the path found - ie after the#
,$
, ... is resolved.Eg:
.if .test -f $dirs/good_file.txt .print -e Including \v{_TestFile}, size \v{_STAT[7]} bytes. .include $dirs/good_file.txt .fi
.error
-
An error is returned to the application, ie
undef
is returned. The remaining text on the line is processed, seeOnError
above. .set
-
Permits the setting of run time options. These may also be given as arguments to
new
: .case
.do
.endswitch
.function
.switch
-
These are reserved directives that may be used in the future.
Math::Expression
variables
Any starting _
are reserved for future use
The following variables will be assigned to:
_FileName
-
The name of the current
File
. _LineNumber
-
The number of the line just read.
_FileNames
-
Array of files being read. The file last
.include
d is in_FileNames[-1]
. _LineNumbers
-
Array of line numbers as
_FileNames
. _IncludeDepth
-
The number of files that are open for reading. The file passed to
new
oropen
is number 1. _
-
Value of the last
.return
. _ARGS
-
Arguments provided to a
.sub
or.include
. _TIME
-
The current time (seconds), supplied by
Math::Expression
. _EOF
-
Set to
1
if.read
finds End Of File, else set to0
. _CountGen
-
Count of lines generated.
_CountSkip
-
Count of lines skipped.
_CountDirect
-
Count of directives processed.
_CountFrames
-
Count of frames opened. For every: sub, if, loop.
_CountOpen
-
Count of files opened.
EmptyArray
EmptyList
-
Empty arrays supplied by
Math::Expression
. _STAT
-
Array of information about the last file found by
.test -f
. Members are as for perl's stat function:0 device number of filesystem 1 inode number 2 file mode (type and permissions) 3 number of (hard) links to the file 4 numeric user ID of file's owner 5 numeric group ID of file's owner 6 the device identifier (special files only) 7 total size of file, in bytes 8 last access time in seconds since the epoch 9 last modify time in seconds since the epoch 10 inode change time in seconds since the epoch 11 preferred block size for file system I/O 12 actual number of blocks allocated
_TestFile
-
The name of the last file found by
.test -f
. _Initialised
-
Internal use, prevent double initialisation of variables.
ERRORS
Most methods return undef
if there is an error. There will be a reason in $IO::ReadPreProcess::errstr
. The error could be from IO::Handle
(where $!
might be helpful) or an error in the file format in which case $!
will be set to EINVAL
.
Beware: getline
returns undef
on end of file as well as error. Checking the method error
will distinguish the two cases.
Note also the property OnError
(see above).
EXAMPLES
The script below sets some variables that are passed on the command line, more from include files and then reads stdin. The variables that are set can be used to control what it reads.
use IO::ReadPreProcess;
use Getopt::Long;
use Math::Expression;
# One arithmetic instance so that variables are visible in all files:
my $ArithEnv = new Math::Expression( PermitLoops => 1, EnablePrintf => 1 );
my @let = ();
my @includes = ();
my $verbose = 0;
my $help = 0;
# Look at command line options ... add other options here:
GetOptions(help => \$help, 'include=s' => \@includes, 'let=s' => \@let, verbose => \$verbose);
Usage if $help;
# Evaluate all --let
# Look like: --let='advanced := 1'
for (@let) {
say "Evaluating: $_" if $verbose;
die "Invalid --let='$_'\n"
unless(defined $ArithEnv->ParseToScalar($_));
}
# Read all --include
# These must not yeild anything other than blank lines
# The point is that we evaluate .let, etc.
for my $file (@includes) {
say "Including: $file" if $verbose;
my $inc = IO::ReadPreProcess->new(File => $file, Math => $ArithEnv, OnError => 'die', PipeOK => 1) or
die "$0: Opening include '$file': $IO::ReadPreProcess::errstr\n";
# All that is next should be empty lines:
while (<$inc>) {
die "Non empty line found via '--include $file' at $inc->{Place}\n"
if /\S/;
}
}
# If not stdin, maybe loop over @ARGV:
my $fh = new IO::ReadPreProcess(Fd => \*STDIN, File => 'Standard input', Math => $ArithEnv) or
die "Startup error: $IO::ReadPreProcess::errstr\n";
while(<$fh>) {
...
die "Error ... at: $fh->{Place}\n"
if(...);
}
# Use pre-processor variable
print "Sum output " . $ArithEnv->ParseToScalar('sum') . "\n";
Most of the interest lies in the input:
.let sum := 0
A line of input
.# Check to see if this is advanced
.if advanced
Complicated stuff
.let level := 'advanced'
.if advanced > 1
.# Bring in an extra file:
.include extra_files/very_complex
.let sum = sum + 2
.fi advanced > 1
.else
Simple stuff
.let level := 'simple'
.fi
.# Bring in an extra file where _ARGS[0] is either 'advanced' or 'simple':
.include extra_files/extra_module \v{level}
.print Showing material that is \v{level}
For more examples see the test suite.
At the end of the run you might want to do this:
# Some stats, for fun:
say STDERR $ArithEnv->ParseToScalar('printf("Preprocessing: lines generated %d, skipped %d. Directives %d, frames opened %d, files opened %d", _CountGen, _CountSkip, _CountDirect, _CountFrames, _CountOpen)');
SECURITY
Do be aware that a .include
will open any file for which the process has permissions. So there is scope for an input file to pass the contents of arbitrary files into your program; this also applies to any files that the initial input file may, directly or indirectly, .include
.
If a pipe is created: read this section twice.
Summary: be aware of the provenance of all input files.
BUGS
When used in the diamond operator in a list context only one line will be returned. This is due to a problem in the perl module overload
.
Please report any bugs or feature requests to bug-io-readpreprocess at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=IO-ReadPreProcess. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc IO::ReadPreProcess
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
AUTHOR
Alain Williams, <addw@phcomp.co.uk>
April 2015, 2017.
COPYRIGHT
Copyright (C) 2015, 2017 Alain Williams. All Rights Reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://dev.perl.org/licenses/ for more information.
ABSTRACT
Provide an 'intelligent' bottom end read function for scripts.