NAME

IO::ReadPreProcess - Macro processing built into IO::File replacement

SYNOPSIS

use IO::ReadPreProcess;

my $fh = new IO::ReadPreProcess(File => './input.file') or
    die "Startup error: $IO::ReadPreProcess::errstr\n";

while(<$fh>) {
    print $_;    # Or other processing of input
}

die($IO::ReadPreProcess::errstr . "\n")
    if($fn->error);

The input file may contain:

This line will be returned by getline
.# This is a comment
.let this := 'that'
Another line
.if this eq 'that'
Another line returned
.print The variable this has the value \v{this}
.else
This line will not be seen
.fi
This line returned
.include another.file
Line returned after the contents of another.file

DESCRIPTION

Provide an 'intelligent' bottom end read function for scripts, what is read is pre-processed before the script sees it. Your program does not need code to conditionally discard some input, include files and substitute values.

An easy way of reading input where some lines are read conditionally and other files included: .if/.else/.elseif/.fi, do: .include .let .print, loops: .while .for; subroutine definition & call; write to other streams - and more.

Provides IO::Handle-ish functions and input diamond - thus easy to slot in to existing scripts.

The preprocessing layer has variables that can be set and read by your perl code. In the input files they are set via .let directives, and can be made part of your script's input with .echo and \v{xxx}.

IO::ReadPreProcess returns lines from the input stream. This may have directives that include:

  • set variables to arithmetic or string expressions

  • conditionally return lines

  • include other files

  • print to stdout or stderr

Conditions are done by Math::Expression.

CONSTRUCTOR

new returns an IO::ReadPreProcess object, undef on error.

Arguments to new

File and Fd

Arguments File and Fd, see method open. If one of these is not given, method open must be called.

Trim

If this is true (default) then input lines will be trimmed of spaces.

Math

A Math::Expression object that will be used for expression evaluation. If this is not given a new object will be instantiated with PermitLoops => 1, EnablePrintf => 1.

If you share a Math::Expression object between different IO::ReadPreProcess objects then the different files being read will see the same variables.

DirStart and DirStartRE

DirStart is the string at the start of a line that introduces a directive, the default is full stop .. If you wish to change this, provide this option. So to use directives like #if go:

new IO::ReadPreProcess(File => 'fred', DirStart> => '#')

Before use the characters that are special in Regular Expressions will have a backslash \ prepended, this string is stored in DirStartRE. If the option DirStartRE is provided this transformation will not be done and the provided string will be used directly, thus more complex start sequences can be used.

Eg: allow the start sequence to be either . or %:

new IO::ReadPreProcess(File => 'fred', DirStartRE> => '[.%]')
Raw

If this is given and true then processing of directives does not happen, they are returned by getline. You may change this property as input is read but take care to avoid errors, eg: a .if is read in Raw mode but its .fi in Cooked mode; a complaint will result as the .fi did not have an .if.

Raw might set when in an .include. When the end of that file is reached the previous file (that had the .include directive) will be returned to and lines read from there.

Default: 0

OnError

What should happen when an error happens. Values:

warn

Print a message to STDERR with warn, this is the default.

die

Print a message to STDERR with die which terminates the program.

Do nothing. The application should check the method error and look at $IO::ReadPreProcess::errstr.

PipeOK

Pipes are only allowed with .include if the property PipeOK is true (default false).

MaxLoopCount

Loops (while, until and for) will abort after this number of iterations. The count restarts if the loop is restarted. A value of 0 disables this test.

This may be overridden on an individual loop with the -i option.

Default 50.

OutStreams

This defines output streams that may be written to by .out and .print -o. The streams can either be IO::File or a reference to a function (when the line will be passed as the only argument).

The members STDOUT and STDERR are added if not passed, given values *STDOUT{IO} and *STDERR{IO}. Names must match the RE /w+/.

Eg:

my $lf = IO::File->new('logFile', 'w+');
sub func {
    say "func called '$_[0]'";
}

OutStreams => { fun => \&func, log => $lf }

This provides the ability to write to multiple places, however the file (or function) must be opened by the Perl script. IO::ReadPreProcess does not provide the ability to open new files.

PUBLIC PROPERTIES

The properties Trim, OnError, MaxLoopCount, OutStreams, PipeOK and Raw (see new) may be directly assigned to at any time.

Eg:

$fh->Raw = 1;

Also the following:

Math

Note that there are many useful values that you can get here, some set by IO::ReadPreProcess (see below), others by .let directives. You can thus communicate with the preprocessing layer.

Eg:

You can set Math variables like this:

$fh->{Math}->VarSetScalar('FirstName', 'Henry');

You can get Math variable values like this:

$name = $fh->{Math}->ParseToScalar('FirstName');

$fileName = $fh->{Math}->ParseToScalar('_FileName');
Place

A string that can be used in messages to the user about the current input place. The value will be like:

line 201 of slides/regular-expressions.mod

Eg:

warn "Something wrong at $fh->{Place}\n";

METHODS

new

This has been discussed above.

open

The argument is the name of the file to be opened and read from. This method need not be used if the information is given to new. open returns an IO::ReadPreProcess object, undef on error.

File

gives the name of the file to be opened. This is mandatory.

Fd

If this is given it provides a file descriptor (from IO::File) that is already open for reading. In which case File (which must still be given) is a name that is used in error messages. This is useful if you want to read from stdin or a pipe.

If there is an error in opening a file look at $IO::ReadPreProcess::errstr;

Example:

$fh->open(Fd => \*STDIN, File => 'Standard input', OutStreams => { log => $lf });
close

Closes the current input file. If the current file was opened by a .include, the next line that is read will be the one after the .include directive.

This will not normally be used by applications.

close returns an IO::ReadPreProcess object, undef on error.

** Also used to end a block

getline

will return a line from input. This line is not necessarily the next one in the input file since directives (see below) may specify that some lines are not returned or that input is taken from another file.

As an alternative, the object (what is returned by new) may be used in the diamond operator which really calls getline.

After all input has been read this returns undef.

while(my $line = $fh->getline) {
    ...
}

Returns undef on error.

getlines

Returns the rest of input as an array.

This must be called in a list context.

Returns undef on error.

putline

The argument list will be put as input on the current frame and these will be 'read' as the very next input. Useful for running a .sub. Eg:

$fh->putline('.show Frodo 35');
binmode

This package is intended to read text files, thus setting binary data is probably not a good idea. binmode also allows different (layer) encoding to be supported, eg:

$fh->binmode(':utf8');

Any binmode settings will be applied to all files subsequently opened, eg: because of .include.

Returns true on success, undef on error.

See perl's binmode function.

eof

Returns 1 if the next read will return End Of File or the file is not open.

error

Returns true of there has been an error. See clearerr.

clearerr

Clears any error indicator.

DIRECTIVES

Input files may contain directives. These all start with a full stop (.) at the start of line, this may be changed with DirStart. There may not be spaces before the ..

Lines starting with directives other than the ones below will be returned to the application.

Conditions are done by Math::Expression.

.#

These lines are treated as comment and are removed from input.

.let

The argument is an expression as understood by Math::Expression. Then result is ignored. This may be used to set one or more variables.

Eg:

    .let count := 0; page := 1
    .let ++count
    .let if(count > 10) { ++page; count := 0 }
.if .elseif .elseif .else .fi .unless

The rest of the .if line is evaluated by Math::Expression and if the result is true the following lines will be returned. An optional .else reverses the sense of the .if as regards (not) returning lines. .if may be nested. A .if must have a matching and ending .fi. .elseif may be used where a .else can be found and must be followed by a condition. .elsif is a synonym for .elseif.

.unless is the same as .if except that the truthness of the result is considered inverted.

Text following .fi or .else will be ignored - you may use as comment.

The condition may be a defined subroutine which will be run and the value set by .return used as the boolean. The arguments are processed as if by .print.

.if .someSub arg1 \v{someVariable}
Conditional text
.fi

The condition may also be one of the directives: .include .read .test

.print

The rest of the line will be printed to stdout.

If the line starts -e it will be written to output stream STDERR.

If the line starts -o strm it will be written to output stream strm.

Eg:

.print -o log Something interesting has happened!

The following escapes will be recognised and substitutions performed:

\e

generates the escape character \.

\0

generates the empty string. You might use this if you wanted to .print a line starting with -e.

\v{var}

interpolates variable var or array member array[index] from Math::Expression. var must match the regular expression: /\w+|\w+\[\w+\]/i

.echo

Escape substitution is performed as with .print and the line returned by getline. This allows variables to be used in the input the application reads without it being aware of what is going on.

.echo Index=\v{i} person=\v{names[i]}
.include

The first argument is a file path that is opened and lines from this returned to the application.

Paths that start / are absolute and are just accepted.

Path that start # are taken to be with respect to the current working directory of the process. The # is removed and the path accepted.

.# Include a file 'header' from a generic 'snippets' directory:
.include #snippets/header

Other paths are relative to the file being processed, the directory path is prepended and the result used. If such a path is used in a file opened by Fd an error results.

.# Include a file in the same directory as the current file:
.include common_module

If the path starts $ the next word is a variable name. The value is prepended to the rest of the path and the file tested for existence as above (eg test starts /, # and others). If the variable is an array the paths are tried until one is found. Eg:

.let dirs := split(':', '.:mydir:#builddir:/home/you/yourdir:/usr/local/ourdir')
.include $dirs/good_file.txt

Words that follow are deemed arguments and made available within the include via the array _ARGS. See .sub.

.# header can generate different headers, ask for one suitable for a report:
.include #snippets/header report

The file path and arguments are processed for escapes as .print.

The file path and arguments may contain spaces if they are surrounded by quotes ('").

If the path starts | the rest of the line is a pipe that will be run and read from. Pipes are only allowed if the property PipeOK is true (default false). WARNING this will run an arbitrary command, you must be confident of the source and contents of the files being processed.

If the first arguments are -s name the file is opened on a named stream that may be used by .read and should be closed with .close.

If the first argument is -pn the file stream is put n frames below the current one. A new frame is created for every file opened, if, while, sub executed, ... (n is an optional number, default: 1)

.close

This is only needed to close named streams. The -s name option is needed.

.out

This diverts output to the output stream (see: OutStreams) mentioned. Lines generated will be sent there until a .out directive without an argument.

Eg:

.out index
Meals in London
Times of the last tube trains
.out
.local

Marks the arguments as variable names that are local to the current block (.include, .while, .sub, ...). When the block returns the previous value will be restored. The values restored are the values of the variables at the time the .local is seen. Note that variable scope is dynamic, not lexical.

This happens automatically for _ARGS on an .include and \c{.sub} and named \c{.sub} arguments.

.return

This ends reading a file early, the previous file is picked up on the line after the .include. At the top level (ie first file) end of file is returned to the application.

Within a .sub this may be used to return a value. The value of the last expression in a .sub is not automatically used as a return value.

.return may be followed by an expression; this will be assigned to the variable _ (underscore). Default undef:

.return count + 1
.exit

The application will be terminated.

Any text after on the line will be processed by Math::Expression and if it is a number it is used as an exit code. If none is specified the exit code will be 2.

.eval

The rest of the line is processed for escapes as .print. It is then treated as if it had just been read. The processed line might even start with a command that is recognised by this module, eg this ends up setting variable a to the value 3:

.let a := 1; b := 2; var := 'a'
.print a=\v{a} b=\v{b}
.eval .let \v{var} := 3
.print a=\v{a} b=\v{b}

Do not use .eval to generate a conditional or loop, eg: .if; .while.

.read

Read the next line of input into a variable. The line is trimmed of the trailing newline (chomped). It is trimmed of white space if Trim. The variable _EOF is assigned 0.

At end of file the variable is assigned the empty string and the variable _EOF is assigned 1. The variable _ is set to 1 on read success.

This will be of most use with a stream opened with a -p or -s option:

.include -p | hostname
.read host
.echo This machine is called \v{host}

.include -s who | whoami
.read -s who me
.echo Logged in as \v{me}
.close -s who
.sub

Defines a subroutine on the following lines, ending with .done. The subroutine is called by invoking it .name.

Arguments may be passed to the subroutine and are available in the array _ARGS. Following the name optional names may be given, these are variables as .local and, when called, any arguments are copied there. Beware: these are copies, ie separate from what is in _ARGS.

.sub show name age
Hobbits live in the Shire
.echo \v{name} is \v{age} years old
.echo That name again: \v{_ARGS[0]}
.done
.show 'Bilbo Baggins' 50

You can get the original argument string with join, beware this will not give the exact argument string since if two words are separated by more than one space the extra spaces will be lost.

.sub manyArgs
.let allArg := join(' ', _ARGS); na := count(_ARGS)
.echo All \v{na} arguments as a string '\v{allArg}'
.done
.manyArgs all cats have whiskers
.noop

This is a no-operation and does nothing.

.while

This starts a loop that continues as long as the expression (see .if) is true. The loop is terminated by the line .done.

If the option -inn is given, the loop limit is set to nn for this loop. See default MaxLoopCount. There may be spaces between -i and nn.

.let i := 0
.while -i 100 i++ < 100
Part of a .while loop
.echo i has the value \v{i}
.done

Loops are buffered in memory. .include within a loop is not buffered, ie read on every iteration.

.until

This is the same as .while except that the loop stops when the expression becomes true.

.for

This starts a loop. The loop is terminated by the line .done.

This has the form:

.for init ;; condition ;; incr

Note that the ;; will be seen even if inside a quoted string.

As with .while and .until you may use the -i option. init is run once before the loop starts; condition is as .while; incr is run after every iteration. init and incr are processed by Math::Expression, ie no subs allowed.

Eg:

Count down begins
.for i := 10 ;; i > 0 ;; i--
.echo \v{i}
.done
Blast off!

.sub foo num
.return num > 2
.done

.for i := 5 ;; .foo \v{i} ;; i--
something ...
.done
.break .last

Terminate the current loop. These directives are synonyms.

These may be followed by the number of loops to terminate, default 1.

.continue .next

Abandon the rest of the current loop, start the next iteration. These directives are synonyms.

These may be followed by a number, inner ones are terminated, that loop number has its iteration started, default 1.

.done

Ends blocks: .while .until .for .sub. If may be followed by the type of block that it ends, if so a consistency check is made.

Eg:

.for i := 0 ;; i < 5 ;; i++
Text output
.done for
.test

Various tests. This will set _ to 0 or 1.

-f

This returns true if the argument file exists. The file path is as for .include except that pipes are not allowed. This also sets the array _STAT with information about the file (see below) and _TestFile will be the path found - ie after the #, $, ... is resolved.

Eg:

.if .test -f $dirs/good_file.txt
.print -e Including \v{_TestFile}, size \v{_STAT[7]} bytes.
.include $dirs/good_file.txt
.fi
.error

An error is returned to the application, ie undef is returned. The remaining text on the line is processed, see OnError above.

.set

Permits the setting of run time options. These may also be given as arguments to new:

trace=n

Set the trace level to n. 1 traces directives, 2 traces directives and generated input.

.case .do .endswitch .function .switch

These are reserved directives that may be used in the future.

Math::Expression variables

Any starting _ are reserved for future use

The following variables will be assigned to:

_FileName

The name of the current File.

_LineNumber

The number of the line just read.

_FileNames

Array of files being read. The file last .included is in _FileNames[-1].

_LineNumbers

Array of line numbers as _FileNames.

_IncludeDepth

The number of files that are open for reading. The file passed to new or open is number 1.

_

Value of the last .return.

_ARGS

Arguments provided to a .sub or .include.

_TIME

The current time (seconds), supplied by Math::Expression.

_EOF

Set to 1 if .read finds End Of File, else set to 0.

_CountGen

Count of lines generated.

_CountSkip

Count of lines skipped.

_CountDirect

Count of directives processed.

_CountFrames

Count of frames opened. For every: sub, if, loop.

_CountOpen

Count of files opened.

EmptyArray EmptyList

Empty arrays supplied by Math::Expression.

_STAT

Array of information about the last file found by .test -f. Members are as for perl's stat function:

 0 device number of filesystem
 1 inode number
 2 file mode  (type and permissions)
 3 number of (hard) links to the file
 4 numeric user ID of file's owner
 5 numeric group ID of file's owner
 6 the device identifier (special files only)
 7 total size of file, in bytes
 8 last access time in seconds since the epoch
 9 last modify time in seconds since the epoch
10 inode change time in seconds since the epoch
11 preferred block size for file system I/O
12 actual number of blocks allocated
_TestFile

The name of the last file found by .test -f.

_Initialised

Internal use, prevent double initialisation of variables.

ERRORS

Most methods return undef if there is an error. There will be a reason in $IO::ReadPreProcess::errstr. The error could be from IO::Handle (where $! might be helpful) or an error in the file format in which case $! will be set to EINVAL.

Beware: getline returns undef on end of file as well as error. Checking the method error will distinguish the two cases.

Note also the property OnError (see above).

EXAMPLES

The script below sets some variables that are passed on the command line, more from include files and then reads stdin. The variables that are set can be used to control what it reads.

    use IO::ReadPreProcess;
    use Getopt::Long;
    use Math::Expression;

    # One arithmetic instance so that variables are visible in all files:
    my $ArithEnv = new Math::Expression( PermitLoops => 1, EnablePrintf => 1 );

    my @let = ();
    my @includes = ();
    my $verbose = 0;
    my $help = 0;

    # Look at command line options ... add other options here:
    GetOptions(help => \$help, 'include=s' => \@includes, 'let=s' => \@let, verbose => \$verbose);

    Usage if $help;

    # Evaluate all --let
    # Look like: --let='advanced := 1'
    for (@let) {
        say "Evaluating: $_" if $verbose;
        die "Invalid --let='$_'\n"
            unless(defined $ArithEnv->ParseToScalar($_));
    }

    # Read all --include
    # These must not yeild anything other than blank lines
    # The point is that we evaluate .let, etc.
    for my $file (@includes) {        
        say "Including: $file" if $verbose;
 
        my $inc = IO::ReadPreProcess->new(File => $file, Math => $ArithEnv, OnError => 'die', PipeOK => 1) or
            die "$0: Opening include '$file': $IO::ReadPreProcess::errstr\n";                       

        # All that is next should be empty lines:
        while (<$inc>) {
            die "Non empty line found via '--include $file' at $inc->{Place}\n"
                if /\S/;
        }
    }

    # If not stdin, maybe loop over @ARGV:
    my $fh = new IO::ReadPreProcess(Fd => \*STDIN, File => 'Standard input', Math => $ArithEnv) or
        die "Startup error: $IO::ReadPreProcess::errstr\n";

    while(<$fh>) {
	...

	die "Error ... at: $fh->{Place}\n"
	    if(...);
    }

    # Use pre-processor variable
    print "Sum output " . $ArithEnv->ParseToScalar('sum') . "\n";

Most of the interest lies in the input:

.let sum := 0
A line of input

.# Check to see if this is advanced
.if advanced

Complicated stuff
.let level := 'advanced'

.if advanced > 1

.# Bring in an extra file:
.include extra_files/very_complex
.let sum = sum + 2

.fi advanced > 1
.else

Simple stuff
.let level := 'simple'

.fi

.# Bring in an extra file where _ARGS[0] is either 'advanced' or 'simple':
.include extra_files/extra_module \v{level}

.print Showing material that is \v{level}

For more examples see the test suite.

At the end of the run you might want to do this:

# Some stats, for fun:
say STDERR $ArithEnv->ParseToScalar('printf("Preprocessing: lines generated %d, skipped %d. Directives %d, frames opened %d, files opened %d", _CountGen, _CountSkip, _CountDirect, _CountFrames, _CountOpen)');

SECURITY

Do be aware that a .include will open any file for which the process has permissions. So there is scope for an input file to pass the contents of arbitrary files into your program; this also applies to any files that the initial input file may, directly or indirectly, .include.

If a pipe is created: read this section twice.

Summary: be aware of the provenance of all input files.

BUGS

When used in the diamond operator in a list context only one line will be returned. This is due to a problem in the perl module overload.

Please report any bugs or feature requests to bug-io-readpreprocess at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=IO-ReadPreProcess. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc IO::ReadPreProcess

You can also look for information at:

AUTHOR

Alain Williams, <addw@phcomp.co.uk> April 2015, 2017.

COPYRIGHT

Copyright (C) 2015, 2017 Alain Williams. All Rights Reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://dev.perl.org/licenses/ for more information.

ABSTRACT

Provide an 'intelligent' bottom end read function for scripts.