The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.




 my ($lines, $info) = $process_file($file, process => sub { 
     my ($fh, $lines, $args, $line) = @_;
     return uc($line);


Many scripts need to process on or more text files. The boiler-plate usually looks something like:

 open my $fh, '<', $file
    or croak "blah blah blah...\n";

 while (<$fh> ) {
   # do something...

 close $fh or
    croak "blah blah blah...\n";

The do something... part often involves other common operations like removing new lines, skipping blank lines, etc. It gets worse when you have to write the same template for processing different files in a script.

This class that provides a simple harness for processing text files, taking the drudgery out of writing a simple text processor. It is most effect when used on relatively small files.

In it's most basic form the class will return all of the lines in a text file. The class exports 1 method (process_file) which uses four subroutines that you can override or use in conjunction with your processors.


process_file(file, options)

You start the processing of the file by calling process_file with the name of the file or a handle to an open file and a list of options. Note that the processors pass a reference to this list of options during the processing of the file.

The method returns a list containing a reference to an array that contains each line of the file followed by the list of elements in the hash that was originally passed to it (along with any other data your custom method have inserted into it).

 my ($lines, %options) = process_file("foo.txt", chomp => 1);

Path to the file to be processed or a handle to an open file.


A list of options. You can send whatever options your custom processor supports. As each line is read process_file supports these options:


Skip blank lines. A blank line is considered a line with no data or if chomp mode is not enabled, a line with only a new line character.


Set skip_lines to a true value to skip lines that beging with '#'.


Merges lines together rather that creating an array of lines. Typically used with the chomp option. When merge_lines is set to a true value, IO::Scalar is used to efficiently create a single scalar from all of the lines in the file.


Set chomp to a true value to remove a trailing new line.


Set trim to one of front, back, both true value to remove whitespace from the front, back or both front and back of a line. Note that this operation is performed before your custom processor is called and may result in the line being skipped if the skip_blank_lines option is set to a true value.

process_file will execute a pre processing subroutine (pre), a processor (process) which is passed the next line and returns the processed line, and a post processing routine (post). The default processors are described below.

pre(file, options)


Path to a file that can be opened for reading or a handle to an open file.


A reference to a hash that contains the options passed from process_file. The hash will be passed to the process method, so can be used to store data as you are processing the each line. The default process method will record counts of lines processed and other potentially useful statistics.

next_line(fh, lines, options)

The next_line method is passed the file handle, the buffer of accumulated lines, and a reference to a hash of options passed to process_file. It is expected to return the next line of the file. Your custom processor however can return anything it likes. That object will be sent to the process subroutine for possible further processing.

Returning undef will halt further processing.

process(fh, lines, options, current_line)

The process method is passed the file handle, the buffer of accumulated lines, a reference to a hash of options passed to process_file and the next line of the the text file. The default processor simply returns the current_line value. If the chomp option is set to true when you called process_file, the line will be chomped. You can also set the skip_blank_lines or skip_comments to skip blank lines or skip lines that begin with '# '.


Handle to an open file or an object that supports an IO::Handle like interface. If fh is undefined the


A reference to an array that contains the lines read thus far.


A reference to a hash of options passed to process_file.


The next line of data from the file.

post(fh, lines, options)

The post method is passed the same three arguments as passed to process. The default post method closes the file and records the end time of process. The default post method returns an array reference to the buffer of lines and list of options. Note that a reference to the list is passed in but a list is returned. This is also the return value of process_file. Your custon post can return anything it wants.


Any of default processors (pre, next_line, process, post) can be called before or after your custom processors. Pass these methods the same list you receive.

    post  => sub {
      my @retval = post(@_);
      $retval[0] = join '', @{ $_[1] };
      return @retval;




  • Return the all of the lines in a text file

     my ($lines) = process_file('foo.txt');
  • Read JSON file

      print Dumper(
          chomp => 1,
          post  => sub {
            return decode_json( join '', @{ $_[1] } );


      print Dumper(
            chomp       => 1,
            merge_lines => 1
  • Read CSV file

     my $csv = Text::CSV_XS->new;
     my ($csv_lines) = process_file(
       csv   => $csv,
       chomp => 1,
       has_headers => 1,
       pre   => sub {
         my ( $fh, $args ) = @_;
         if ( $args->{'has_headers'} ) {
           my @column_names = $args->{csv}->getline($fh);
         return (pre($fh, $args));
       next_line => sub {
         my ( $fh, $all_lines, $args ) = @_;
         my $ref = $args->{csv}->getline_hr($fh);
         return $ref;


This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.


Rob Lauer - <>

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 425:

You forgot a '=back' before '=head2'

Around line 585:

You forgot a '=back' before '=head1'