NAME

File::FilterFuncs - specify filter functions for files

SYNOPSIS

use File::FilterFuncs qw(filters);

filters('source.txt',
    sub { $_ = uc $_; 1 },
    'dest.txt'
);

INTRODUCTION

File::FilterFuncs makes it easy to perform transformations on files. When you use this module, you specify a group of filter functions that perform transformations on the lines in a source file. Those transformed lines are written to the destination file that you specify. For example, this code converts an entire file to upper-case, line-by-line:

use File::FilterFuncs qw(filters);

filters('source.txt',
    sub { $_ = uc $_; 1 },
    'dest.txt'
);

The "1" at the end of the filter subroutine tells filters to keep all the lines. The filter subroutine should return 1 for any lines that should be kept, and it should return 0 for any lines that should be ignored. This program copies only lines that contain something besides just whitespace:

use File::FilterFuncs qw(filters);

filters('source.txt',
   sub { /\S/ },
   'dest.txt'
);

The entire source file is not read into memory. Instead it is read one line at a time, and the destination file is written one line at a time.

Just as Perl's concept of a line can be changed by setting $/, so the filters function's idea of a line can also be changed by specifying a value for $/ in the call to filters:

my $pad = "\0" x 2;
filters('source.dat',
   '$/' => 1022,
   sub { $_ .= $pad; 1 },
   'dest.dat'
);

Filter functions are invoked in the order in which they are seen. This code upper-cases then puts inside parenthses every line in 'source.txt' and copies the output to 'dest.txt':

filters ('source.txt',
   sub { $_ = uc $_; 1 },
   sub { chomp $_; $_ = "($_)\n"; 1 },
   'dest.txt'
);

Obviously, the current line that is being worked on is in $_.

The filters subroutine expects its first argument to be the name of the source file, and the last argument should be the name of the destination file. The function filters will die if either one of the file names is missing or if they are inaccessible.

OPTIONS

A few options determine how the filters subroutine works.

binmode

Binmode lets you specify a layer to be used for the input data. For example, this will read a utf-8 file and write the data using the default output layer:

filters (
   'source.txt',
   binmode => ':utf8',
   'dest.txt',
);
boutmode

Boutmode lets the programmer specify a layer to be used for writing the output data. For example, this code on a Linux platform should read text data using the Linux end-of-line format and write it using the DOS (CRLF) end-of-line format:

filters (
   'source.txt',
   boutmode => ':crlf',
   'dest.txt',
);
$/

Setting $/ lets you determine how an end-of-line is recognized. Set this option to the same value that you would set the $/ variable to in a program. For example, suppose a file contains this:

ABCDEFGHIJKL

The following program should write three letters at a time to the output file:

filters (
   'source.txt',
   '$/' => \3,
   sub { $_ = "$_\n"; 1 },
   'dest.txt',
);

NOTES

Alternate function name

If you consider the function name filters to be too generic, you can import the name filter_funcs instead.

Convenience return values

For the programmer's convenience and to facilitate self-documenting code, the values $KEEP_LINE and $IGNORE_LINE can be exported. As an example, this is another program to filter out lines containing only whitespace:

use File::FilterFuncs qw(filters $IGNORE_LINE);

filters('source.txt',
      sub { return $IGNORE_LINE unless /\S/ },
      'dest.txt'
);

BUGS

The source and destination files cannot be the same. If the source and destionation files have the same name and path, filters dies with an appropriate error message. If symbolic or hard-linking is used to give the same file two different names, the results are undefined.

E-mail bug reports to mumia.w.18.spam+nospam [at] earthlink.net .

CREDITS

Thanks go to Uri Guttman <uri [at] stemsystems.com> for several helpful suggestions including enabling the slurp and paragraph modes and dealing with filtering a file onto itself.

Andy <anedza [at] infotek-consulting.com> also commented on the need to explain or simplify the use of the callback filter functions.

TODO

  • Allow file handles to be used for input and output.

AUTHOR

Copyright 2007 Mumia Wotse
Mumia Wotse <mumia.w.18.spam+nospam [at] earthlink.net>

This program is under the General Public License (GPL).