NAME

List::RewriteElements - Create a new list by rewriting elements of a first list

SYNOPSIS

use List::RewriteElements;

Constructor

Simplest case: Input from array, output to STDOUT.

$lre = List::RewriteElements->new( {
    list        => \@source,
    body_rule   => sub {
                        my $record = shift;
                        $record .= q{additional field};
                   },
} );

Input from file, output to STDOUT:

$lre = List::RewriteElements->new( {
    file        => "/path/to/source/file",
    body_rule   => sub {
                        my $record = shift;
                        $record .= q{,additional field};
                   },
} );

Provide a different rule for the first element in the list:

$lre = List::RewriteElements->new( {
    file        => "/path/to/source/file",
    header_rule => sub {
                        my $record = shift;
                        $record .= q{,ADDITIONAL HEADER};
                   },
    body_rule   => sub {
                        my $record = shift;
                        $record .= q{,additional field};
                   },
} );

Input from file, output to file:

$lre = List::RewriteElements->new( {
    file        => "/path/to/source/file",
    body_rule   => sub {
                        my $record = shift;
                        $record .= q{additional field};
                   },
    output_file => "/path/to/output/file",
} );

To name output file, just provide a suffix to filename:

$lre = List::RewriteElements->new( {
    file            => "/path/to/source/file",
    body_rule       => sub {
                        my $record = shift;
                        $record .= q{additional field};
                       },
    output_suffix   => '.out',
} );

Provide criteria to suppress output of header or individual record.

$lre = List::RewriteElements->new( {
    file            => "/path/to/source/file",
    header_suppress => sub {
                        my $record = shift;
                        return if $record =~ /$somepattern/;
                    },
    body_suppress   => sub {
                        my $record = shift;
                        return if $record ne 'somestring';
                    },
    body_rule       => sub {
                        my $record = shift;
                        $record .= q{additional field};
                    },
} );

Generate Output

$lre->generate_output();

Report Output Information

$path_to_output_file    = $lre->get_output_path();

$output_file_basename   = $lre->get_output_basename();

$output_row_count       = $lre->get_total_rows();

$output_record_count    = $lre->get_total_records();

$records_changed        = $lre->get_records_changed();

$records_unchanged      = $lre->get_records_unchanged();

$records_deleted        = $lre->get_records_deleted();

$header_status          = $lre->get_header_status();

DESCRIPTION

It is common in many situations for you to receive a flat data file from someone else and have to generate a new file in which each row or record in the incoming file must either (a) be transformed according to some rule before being printing to the new file; or (b) if it meets certain criteria, not output to the new file at all.

List::RewriteElements enables you to write such rules and criteria, generate the file of transformed data records, and get back some basic statistics about the transformation.

List::RewriteElements is useful when the number of records in the incoming file may be large and you do not want to hold the entire list in memory. Similarly, the newly generated records are not held in memory but are immediately printed to STDOUT or to file.

On the other hand, if for some reason you already have an array of records in memory, you can use List::RewriteElements to apply rules and criteria to each element of the array and then print the transformed records (again, without holding the output in memory).

SUBROUTINES

new()

Purpose: List::RewriteElements constructor.

Arguments: Reference to a hash holding the following keys:

  • file or list

    The hash must hold either a file element or a list element -- but not both! The value for the file key must be an absolute path to an input file. The value for list must be a reference to an array in memory.

  • body_rule

    The hash must have a body_rule element whose value is a reference to a subroutine providing a formula for the transformation of an individual record in the incoming file to a record in the outgoing file. The first argument passed to this subroutine must be the record from the incoming file. The return value from this subroutine should be a string immediately ready for printing to the output file (though the string should not end in a newline, as printing will be handled by generate_output()).

  • body_suppress

    Optionally, you may provide a body_suppress element whose value is a reference to a subroutine providing a criterion according to which an individual record in the incoming file should be output to the outgoing file or not output, i.e., omitted from the output entirely. The first argument to this subroutine should be the record from the incoming file. The subroutine should, at least implicitly, return a true value when the record should be output. The subroutine should simply return, <i.e.>, return an implicit undef, when the record should be omitted from the outgoing file.

  • header_rule

    Frequently the first row in a flat data file is a header row containing, say, the names of the columns in a data table, joined by a delimiter. Because the header row is different from all subsequent rows, you may optionally provide a header_rule element whose value is a reference to a subroutine providing a formula for the transformation of the header row in the incoming file to the header in the outgoing file. The first argument passed to this subroutine must be the header row from the incoming file. The return value from this subroutine should be a string immediately ready for printing to the output file (though the string should not end in a newline, as printing will be handled by generate_output()).

  • header_suppress

    Optionally, if you have provided a header_rule element, you may provide a header_suppress element whose value is a reference to a subroutine providing a criterion according to which an the header row from the incoming file should be output to the outgoing file or not output, i.e., omitted from the output entirely. The first argument to this subroutine should be the header from the incoming file. The subroutine should, at least implicitly, return a true value when the header should be output. The subroutine should simply return, <i.e.>, return an implicit undef, when the header should be omitted from the outgoing file.

  • output_file or output_suffix

    It is recommended that you supply either an output_file or an output_suffix element to the constructor; otherwise, the new list generated by application of the rules and criteria will simply print to STDOUT. The value of an output_file element should be a full path to the newly created file. If you wish to create a new file name without specifying a full path but simply by tacking on a suffix to the name of the incoming file, provide an output_suffix element and the outgoing file will be created in the directory which is the current working directory as of the point where generate_output() is called. An output_suffix element will be ignored if an output_file element is provided.

  • Note 1

    If neither a header_rule or header_suppress element is provide to the constructor, List::RewriteElements will treat the first row of the incoming file the same as any other row, i.e., it will apply the body_rule transformation formula.

  • Note 2

    A body_suppress or header_suppress criterion, if present, will be logically applied before any body_rule or header_rule formula. We don't apply the formula to transform a record if the record should not be output at all.

  • Note 3

Return Value: List::RewriteElements object.

generate_output()

Purpose: Generates the output specified by arguments to new(), i.e., creates an output file or prints to STDOUT with records transformed as per those arguments.

Arguments: None.

Return Value: Returns true value upon success. In case of failure it will croak with some error message.

get_output_path()

Purpose: Get the full path to the newly created output file.

Arguments: None.

Return Value: String holding path to newly created output file.

Comment: Since use of the output_suffix attribute means that the full path to the output file will not be known until generate_output() has been called, get_output_path() will only give a meaningful result once generate_output() has been called. Otherwise, it will default to an empty string.

get_output_basename()

Purpose: Get only the basename of the newly created output file.

Arguments: None.

Return Value: String holding basename of newly created output file.

Comment: Since use of the output_suffix attribute means that the full path to the output file will not be known until generate_output() has been called, get_output_basename() will only give a meaningful result once generate_output() has been called. Otherwise, it will default to an empty string.

get_total_rows()

Purpose: Get the total number of rows in the newly created output file. This will include any header row.

Arguments: None.

Return Value: Nonnegative integer.

get_total_records()

Purpose: Get the total number of data records in the newly created output file. If a header row is present in that file, get_total_records() will return a value 1 less than that returned by get_total_rows().

Arguments: None.

Return Value: Nonnegative integer.

get_records_changed()

Purpose: Get the number of data records in the newly created output file that are altered versions of records in the incoming file. This value does not include changes in the header row.

Arguments: None.

Return Value: Nonnegative integer.

get_records_unchanged()

Purpose: Get the number of data records in the newly created output file that are unaltered versions of records in the incoming file. This value does not include changes in the header row.

Arguments: None.

Return Value: Nonnegative integer.

get_records_deleted()

Purpose: Get the number of data records in the original source (file or list) that were omitted from the newly created output file due to application of a body_suppress criterion. This value does not include any suppression of a header row following application of a header_suppress criterion.

Arguments: None.

Return Value: Nonnegative integer.

get_header_status()

Purpose: Indicate whether any header row in the original source (file or list)

  • was rewritten in the newly created output file: return value 1;

  • was transferred to the newly created output file without alteration: return value 0;

  • was suppressed from appearing in the output file by application of a header_suppress criterion: return value -1;

  • no header row in the source: return value undef.

Arguments: None.

Return Value: Numerical flag: 1, 0, -1 or undef as described above.

FAQ

Can I simultaneously rewrite records and interact with the external environment?

Yes. If a header_rule, body_rule, header_suppress or body_suppress either (a) needs additional information from the external environment above and beyond that contained in the individual data record or (b) needs to cause a change in the external environment, you can write a closure and call that closure insider the rule.

Example:

my @greeks = qw( alpha beta gamma );

my $get_a_greek = sub {
    return (shift @greeks);
};

my $lre  = List::RewriteElements->new ( {
    list        => [ map {"$_\n"} (1..5) ],
    body_rule   => sub {
        my $record = shift;
        my $rv;
        chomp $record;
        if ($record eq '4') {
            $rv = &{$get_a_greek};
        } else {
            $rv = (10 * $record);
        }
        return $rv;
    },
    body_suppress   => sub {
        my $record = shift;
        chomp $record;
        return if $record eq '5';
    },
} );

$lre->generate_output();

This will produce:

10
20
30
alpha

Can I use List-Rewrite Elements with fixed-width data?

Yes. Suppose that you have this fixed-width data (adapted from Dave Cross' Data Munging with Perl:

my @dataset = (
    q{00374Bloggs & Co       19991105100103+00015000},
    q{00375Smith Brothers    19991106001234-00004999},
    q{00376Camel Inc         19991107289736+00002999},
    q{00377Generic Code      19991108056789-00003999},
);

Suppose further that you need to update certain records and that %revisions holds the data for updating:

my %revisions = (
    376 => [ 'Camel Inc', 20061107, 388293, '+', 4999 ],
    377 => [ 'Generic Code', 20061108, 99821, '-',  6999 ],
);

Write a body_rule subroutine which uses unpack, pack and sprintf as needed to update the records.

my $lre  = List::RewriteElements->new ( {
    list        => \@dataset,
    body_rule   => sub {
        my $record = shift;
        my $template = 'A5A18A8A6AA8';
        my @rec  = unpack($template, $record);
        $rec[0] =~ s/^0+//;
        my ($acctno, %values, $result);
        $acctno = $rec[0];
        $values{$acctno} = [ @rec[1..$#rec] ];
        if ($revisions{$acctno}) {
            $values{$acctno} = $revisions{$acctno};
        }
        $result = sprintf  "%05d%-18s%8d%06d%1s%08d",
            ($acctno, @{$values{$acctno}});
        return $result;
    },
} );

How does this differ from Tie::File?

Mark Jason Dominus' Tie::File module is one of my Fave 5 CPAN modules. It's excellent for modifying a file in place. But I frequently have to leave the source file unmodified and create a new file, which implies, at the very least, opening, printing to, and closing filehandles in addition to using Tie::File. List::RewriteElements hides all that. It also provides the statistical report methods.

Couldn't I do this with map and grep?

Quite possibly. But if your rules and criteria were complicated or long, the content of the map and grep {} blocks would be hard to read. You also wouldn't get the statistical report methods.

How Does It Work?

Why do you care? Why do you want to look inside the black box? If you really want to know, read the source!

PREREQUISITES

List::RewriteElements relies only on modules distributed with the Perl core as of 5.8.0. IO::Capture::Stdout is required for the test suite, but a copy is included in the distribution under the t/ directory.

BUGS

None known at this time. File bug reports at http://rt.cpan.org.

HISTORY

0.06 Sat Dec 16 11:31:38 EST 2006 - Created t/07_fixed_width.t and t/testlib/fixed.t to illustrate use of List::RewriteElements with fixed-width data.

0.05 Thu Dec 14 07:42:24 EST 2006 - Correction of POD formatting errors only; no change in functionality. CPAN upload.

0.04 Wed Dec 13 23:04:33 EST 2006 - More tests; fine-tuning of code and documentation. First CPAN upload.

0.03 Tue Dec 12 22:13:00 EST 2006 - Implementation of statistical methods; more tests.

0.02 Mon Dec 11 19:38:26 EST 2006 - Added tests to demonstrate use of closures to supply additional information to elements such as body_rule.

0.01 Sat Dec 9 22:29:51 2006 - original version; created by ExtUtils::ModuleMaker 0.47

ACKNOWLEDGEMENTS

Thanks to David Landgren for raising the question of use of List-RewriteElements with fixed-width data.

I then adapted an example from Dave Cross' Data Munging with Perl, Chapter 7.1, "Fixed-width Data," to provide a test demonstrating processing of fixed-width data.

AUTHOR

James E Keenan. CPAN ID: JKEENAN. jkeenan@cpan.org. http://search.cpan.org/~jkeenan/ or http://thenceforward.net/perl/modules/List-RewriteElements.

COPYRIGHT

Copyright 2006 James E Keenan (USA).

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO

David Cross, Data Munging with Perl (Manning, 2001).