NAME

File::FilterFuncs - specify filter functions for files

SYNOPSIS

use File::FilterFuncs qw(filters);

filters('source.txt',
    sub { uc $_ },
    'dest.txt'
);

INTRODUCTION

File::FilterFuncs makes it easy to perform transformations on files. When you use this module, you specify a group of filter functions that perform transformations on the lines in a source file. Those transformed lines are written to the destination file that you specify. For example, this code coverts an entire file to upper-case, line-by-line:

use File::FilterFuncs qw(filters);

filters('source.txt',
    sub { uc $_ },
    'dest.txt'
);

The entire source file is not read into memory. Instead it is read one line at a time, and the destination file is written one line at a time.

Just as Perl's concept of a line can be changed by setting $/, so the filters function's idea of a line can also be changed by specifying a value for $/ in the call to filters:

my $pad = "\0" x 2;
filters('source.dat',
   '$/' => 1022,
   sub { $_ . $pad },
   'dest.dat'
);

Filter functions are invoked in the order in which they are seen. This code upper-cases then puts inside parenthses every line in 'source.txt' and copies the output to 'dest.txt':

filters ('source.txt',
   sub { uc $_ },
   sub { chomp $_; "($_)\n" },
   'dest.txt'
);

The current line that is being worked on is in $_. The return value of each filter function is reassigned to $_ for the benefit of the next filter function.

The filters subroutine expects its first argument to be the name of the source file, and the last argument should be the name of the destination file. The function filters will die if either one of the file names is missing or if they are inaccessible.

If you need to actually filter using File::FilterFuncs :-) , you can call ignore_line() from within your filter function. This program ignores any line that contains only whitespace:

use File::FilterFuncs qw(filters ignore_line);

my $func = sub {
    unless (/[^\s]/) {
        ignore_line();
        return;
    };
    $_;
};

filters('source.txt', $func, 'dest.txt');

OPTIONS

A few options determine how the filters subroutine works.

binmode

Binmode lets you specify a layer to be used for the input data. For example, this will read a utf-8 file and write the data using the default output layer:

filters (
   'source.txt',
   binmode => ':utf8',
   'dest.txt',
);
boutmode

Boutmode lets the programmer specify a layer to be used for writing the output data. For example, this code on a Linux platform should read text data using the Linux end-of-line format and write it using the DOS (CRLF) end-of-line format:

filters (
   'source.txt',
   boutmode => ':crlf',
   'dest.txt',
);
$/

Setting $/ lets you determine how an end-of-line is recognized. Set this option to the same value that you would set the $/ variable to in a program. For example, suppose a file contains this:

ABCDEFGHIJKL

The following program should write three letters at a time to the output file:

filters (
   'source.txt',
   '$/' => \3,
   sub { "$_\n" },
   'dest.txt',
);

BUGS

The source and destination files cannot be the same. If the source and destionation files have the same name and path, filters dies with an appropriate error message. If symbolic or hard-linking is used to give the same file two different names, the results are undefined.

This module is not thread-safe due to the use of the package variable $ignore_line.

E-mail bug reports to mumia.w.18.spam+nospam [at] earthlink.net .

CREDITS

Thanks go to Uri Guttman <uri [at] stemsystems.com> for several helpful suggestions including enabling the slurp and paragraph modes and dealing with filtering a file onto itself.

Andy <anedza [at] infotek-consulting.com> also commented on the need to explain or simplify the use of the callback filter functions.

TODO

  • Allow file handles to be used for input and output.

  • Merge the functionalities of both line selection and line transformation into a single callback subroutine.

AUTHOR

Copyright 2007 Mumia Wotse
Mumia Wotse <mumia.w.18.spam+nospam [at] earthlink.net>

This program is under the General Public License (GPL).