NAME
File::FilterFuncs - specify filter functions for files
SYNOPSIS
use File::FilterFuncs qw(filters);
filters('source.txt',
sub { $_ = uc $_; 1 },
'dest.txt'
);
INTRODUCTION
File::FilterFuncs
makes it easy to perform transformations on files. When you use this module, you specify a group of filter functions that perform transformations on the lines in a source file. Those transformed lines are written to the destination file that you specify. For example, this code converts an entire file to upper-case, line-by-line:
use File::FilterFuncs qw(filters);
filters('source.txt',
sub { $_ = uc $_; 1 },
'dest.txt'
);
The "1" at the end of the filter subroutine tells filters
to keep all the lines. The filter subroutine should return 1 for any lines that should be kept, and it should return 0 for any lines that should be ignored. This program copies only lines that contain something besides just whitespace:
use File::FilterFuncs qw(filters);
filters('source.txt',
sub { /\S/ },
'dest.txt'
);
The entire source file is not read into memory. Instead it is read one line at a time, and the destination file is written one line at a time.
Just as Perl's concept of a line can be changed by setting $/
, so the filters
function's idea of a line can also be changed by specifying a value for $/
in the call to filters
:
my $pad = "\0" x 2;
filters('source.dat',
'$/' => 1022,
sub { $_ .= $pad; 1 },
'dest.dat'
);
Filter functions are invoked in the order in which they are seen. This code upper-cases then puts inside parenthses every line in 'source.txt' and copies the output to 'dest.txt':
filters ('source.txt',
sub { $_ = uc $_; 1 },
sub { chomp $_; $_ = "($_)\n"; 1 },
'dest.txt'
);
Obviously, the current line that is being worked on is in $_
.
The filters
subroutine expects its first argument to be the name of the source file, and the last argument should be the name of the destination file. The function filters
will die
if either one of the file names is missing or if they are inaccessible.
OPTIONS
A few options determine how the filters
subroutine works.
binmode
-
Binmode
lets you specify a layer to be used for the input data. For example, this will read a utf-8 file and write the data using the default output layer:filters ( 'source.txt', binmode => ':utf8', 'dest.txt', );
boutmode
-
Boutmode
lets the programmer specify a layer to be used for writing the output data. For example, this code on a Linux platform should read text data using the Linux end-of-line format and write it using the DOS (CRLF) end-of-line format:filters ( 'source.txt', boutmode => ':crlf', 'dest.txt', );
$/
-
Setting
$/
lets you determine how an end-of-line is recognized. Set this option to the same value that you would set the$/
variable to in a program. For example, suppose a file contains this:ABCDEFGHIJKL
The following program should write three letters at a time to the output file:
filters ( 'source.txt', '$/' => \3, sub { $_ = "$_\n"; 1 }, 'dest.txt', );
NOTES
- Alternate function name
-
If you consider the function name
filters
to be too generic, you can import the namefilter_funcs
instead. - Convenience return values
-
For the programmer's convenience and to facilitate self-documenting code, the values
$KEEP_LINE
and$IGNORE_LINE
can be exported. As an example, this is another program to filter out lines containing only whitespace:use File::FilterFuncs qw(filters $IGNORE_LINE); filters('source.txt', sub { return $IGNORE_LINE unless /\S/ }, 'dest.txt' );
BUGS
The source and destination files cannot be the same. If the source and destionation files have the same name and path, filters
dies with an appropriate error message. If symbolic or hard-linking is used to give the same file two different names, the results are undefined.
E-mail bug reports to mumia.w.18.spam+nospam [at] earthlink.net .
CREDITS
Thanks go to Uri Guttman <uri [at] stemsystems.com> for several helpful suggestions including enabling the slurp and paragraph modes and dealing with filtering a file onto itself.
Andy <anedza [at] infotek-consulting.com> also commented on the need to explain or simplify the use of the callback filter functions.
TODO
Allow file handles to be used for input and output.
AUTHOR
Copyright 2007 Mumia Wotse
Mumia Wotse <mumia.w.18.spam+nospam [at] earthlink.net>
This program is under the General Public License (GPL).