NAME
List::RewriteElements - Create a new list by rewriting elements of a first list
SYNOPSIS
use List::RewriteElements;
Constructor
Simplest case: Input from array, output to STDOUT.
$lre = List::RewriteElements->new( {
list => \@source,
body_rule => sub {
my $record = shift;
$record .= q{additional field};
},
} );
Input from file, output to STDOUT:
$lre = List::RewriteElements->new( {
file => "/path/to/source/file",
body_rule => sub {
my $record = shift;
$record .= q{,additional field};
},
} );
Provide a different rule for the first element in the list:
$lre = List::RewriteElements->new( {
file => "/path/to/source/file",
header_rule => sub {
my $record = shift;
$record .= q{,ADDITIONAL HEADER};
},
body_rule => sub {
my $record = shift;
$record .= q{,additional field};
},
} );
Input from file, output to file:
$lre = List::RewriteElements->new( {
file => "/path/to/source/file",
body_rule => sub {
my $record = shift;
$record .= q{additional field};
},
output_file => "/path/to/output/file",
} );
To name output file, just provide a suffix to filename:
$lre = List::RewriteElements->new( {
file => "/path/to/source/file",
body_rule => sub {
my $record = shift;
$record .= q{additional field};
},
output_suffix => '.out',
} );
Provide criteria to suppress output of header or individual record.
$lre = List::RewriteElements->new( {
file => "/path/to/source/file",
header_suppress => sub {
my $record = shift;
return if $record =~ /$somepattern/;
},
body_suppress => sub {
my $record = shift;
return if $record ne 'somestring';
},
body_rule => sub {
my $record = shift;
$record .= q{additional field};
},
} );
Generate Output
$lre->generate_output();
Report Output Information
$path_to_output_file = $lre->get_output_path();
$output_file_basename = $lre->get_output_basename();
$output_row_count = $lre->get_total_rows();
$output_record_count = $lre->get_total_records();
$records_changed = $lre->get_records_changed();
$records_unchanged = $lre->get_records_unchanged();
$records_deleted = $lre->get_records_deleted();
$header_status = $lre->get_header_status();
DESCRIPTION
It is common in many situations for you to receive a flat data file from someone else and have to generate a new file in which each row or record in the incoming file must either (a) be transformed according to some rule before being printing to the new file; or (b) if it meets certain criteria, not output to the new file at all.
List::RewriteElements enables you to write such rules and criteria, generate the file of transformed data records, and get back some basic statistics about the transformation.
List::RewriteElements is useful when the number of records in the incoming file may be large and you do not want to hold the entire list in memory. Similarly, the newly generated records are not held in memory but are immediately print
ed to STDOUT or to file.
On the other hand, if for some reason you already have an array of records in memory, you can use List::RewriteElements to apply rules and criteria to each element of the array and then print the transformed records (again, without holding the output in memory).
SUBROUTINES
new()
Purpose: List::RewriteElements constructor.
Arguments: Reference to a hash holding the following keys:
file
orlist
The hash must hold either a
file
element or alist
element -- but not both! The value for thefile
key must be an absolute path to an input file. The value forlist
must be a reference to an array in memory.body_rule
The hash must have a
body_rule
element whose value is a reference to a subroutine providing a formula for the transformation of an individual record in the incoming file to a record in the outgoing file. The first argument passed to this subroutine must be the record from the incoming file. The return value from this subroutine should be a string immediately ready for printing to the output file (though the string should not end in a newline, as printing will be handled bygenerate_output()
).body_suppress
Optionally, you may provide a
body_suppress
element whose value is a reference to a subroutine providing a criterion according to which an individual record in the incoming file should be output to the outgoing file or not output, i.e., omitted from the output entirely. The first argument to this subroutine should be the record from the incoming file. The subroutine should, at least implicitly, return a true value when the record should be output. The subroutine should simplyreturn
, <i.e.>, return an implicitundef
, when the record should be omitted from the outgoing file.header_rule
Frequently the first row in a flat data file is a header row containing, say, the names of the columns in a data table, joined by a delimiter. Because the header row is different from all subsequent rows, you may optionally provide a
header_rule
element whose value is a reference to a subroutine providing a formula for the transformation of the header row in the incoming file to the header in the outgoing file. The first argument passed to this subroutine must be the header row from the incoming file. The return value from this subroutine should be a string immediately ready for printing to the output file (though the string should not end in a newline, as printing will be handled bygenerate_output()
).header_suppress
Optionally, if you have provided a
header_rule
element, you may provide aheader_suppress
element whose value is a reference to a subroutine providing a criterion according to which an the header row from the incoming file should be output to the outgoing file or not output, i.e., omitted from the output entirely. The first argument to this subroutine should be the header from the incoming file. The subroutine should, at least implicitly, return a true value when the header should be output. The subroutine should simplyreturn
, <i.e.>, return an implicitundef
, when the header should be omitted from the outgoing file.output_file
oroutput_suffix
It is recommended that you supply either an
output_file
or anoutput_suffix
element to the constructor; otherwise, the new list generated by application of the rules and criteria will simplyprint
toSTDOUT
. The value of anoutput_file
element should be a full path to the newly created file. If you wish to create a new file name without specifying a full path but simply by tacking on a suffix to the name of the incoming file, provide anoutput_suffix
element and the outgoing file will be created in the directory which is the current working directory as of the point wheregenerate_output()
is called. Anoutput_suffix
element will be ignored if anoutput_file
element is provided.Note 1
If neither a
header_rule
orheader_suppress
element is provide to the constructor, List::RewriteElements will treat the first row of the incoming file the same as any other row,i.e.
, it will apply thebody_rule
transformation formula.Note 2
A
body_suppress
orheader_suppress
criterion, if present, will be logically applied before anybody_rule
orheader_rule
formula. We don't apply the formula to transform a record if the record should not be output at all.Note 3
Return Value: List::RewriteElements object.
generate_output()
Purpose: Generates the output specified by arguments to new()
, i.e., creates an output file or print
s to STDOUT
with records transformed as per those arguments.
Arguments: None.
Return Value: Returns true value upon success. In case of failure it will croak
with some error message.
get_output_path()
Purpose: Get the full path to the newly created output file.
Arguments: None.
Return Value: String holding path to newly created output file.
Comment: Since use of the output_suffix
attribute means that the full path to the output file will not be known until generate_output()
has been called, get_output_path()
will only give a meaningful result once generate_output()
has been called. Otherwise, it will default to an empty string.
get_output_basename()
Purpose: Get only the basename of the newly created output file.
Arguments: None.
Return Value: String holding basename of newly created output file.
Comment: Since use of the output_suffix
attribute means that the full path to the output file will not be known until generate_output()
has been called, get_output_basename()
will only give a meaningful result once generate_output()
has been called. Otherwise, it will default to an empty string.
get_total_rows()
Purpose: Get the total number of rows in the newly created output file. This will include any header row.
Arguments: None.
Return Value: Nonnegative integer.
get_total_records()
Purpose: Get the total number of data records in the newly created output file. If a header row is present in that file, get_total_records()
will return a value 1
less than that returned by get_total_rows()
.
Arguments: None.
Return Value: Nonnegative integer.
get_records_changed()
Purpose: Get the number of data records in the newly created output file that are altered versions of records in the incoming file. This value does not include changes in the header row.
Arguments: None.
Return Value: Nonnegative integer.
get_records_unchanged()
Purpose: Get the number of data records in the newly created output file that are unaltered versions of records in the incoming file. This value does not include changes in the header row.
Arguments: None.
Return Value: Nonnegative integer.
get_records_deleted()
Purpose: Get the number of data records in the original source (file or list) that were omitted from the newly created output file due to application of a body_suppress
criterion. This value does not include any suppression of a header row following application of a header_suppress
criterion.
Arguments: None.
Return Value: Nonnegative integer.
get_header_status()
Purpose: Indicate whether any header row in the original source (file or list)
was rewritten in the newly created output file: return value
1
;was transferred to the newly created output file without alteration: return value
0
;was suppressed from appearing in the output file by application of a
header_suppress
criterion: return value-1
;no header row in the source: return value
undef
.
Arguments: None.
Return Value: Numerical flag: 1
, 0
, -1
or undef
as described above.
FAQ
Can I simultaneously rewrite records and interact with the external environment?
Yes. If a header_rule
, body_rule
, header_suppress
or body_suppress
either (a) needs additional information from the external environment above and beyond that contained in the individual data record or (b) needs to cause a change in the external environment, you can write a closure and call that closure insider the rule.
Example:
my @greeks = qw( alpha beta gamma );
my $get_a_greek = sub {
return (shift @greeks);
};
my $lre = List::RewriteElements->new ( {
list => [ map {"$_\n"} (1..5) ],
body_rule => sub {
my $record = shift;
my $rv;
chomp $record;
if ($record eq '4') {
$rv = &{$get_a_greek};
} else {
$rv = (10 * $record);
}
return $rv;
},
body_suppress => sub {
my $record = shift;
chomp $record;
return if $record eq '5';
},
} );
$lre->generate_output();
This will produce:
10
20
30
alpha
Can I use List-Rewrite Elements with fixed-width data?
Yes. Suppose that you have this fixed-width data (adapted from Dave Cross' Data Munging with Perl):
my @dataset = (
q{00374Bloggs & Co 19991105100103+00015000},
q{00375Smith Brothers 19991106001234-00004999},
q{00376Camel Inc 19991107289736+00002999},
q{00377Generic Code 19991108056789-00003999},
);
Suppose further that you need to update certain records and that %revisions
holds the data for updating:
my %revisions = (
376 => [ 'Camel Inc', 20061107, 388293, '+', 4999 ],
377 => [ 'Generic Code', 20061108, 99821, '-', 6999 ],
);
Write a body_rule
subroutine which uses unpack
, pack
and sprintf
as needed to update the records.
my $lre = List::RewriteElements->new ( {
list => \@dataset,
body_rule => sub {
my $record = shift;
my $template = 'A5A18A8A6AA8';
my @rec = unpack($template, $record);
$rec[0] =~ s/^0+//;
my ($acctno, %values, $result);
$acctno = $rec[0];
$values{$acctno} = [ @rec[1..$#rec] ];
if ($revisions{$acctno}) {
$values{$acctno} = $revisions{$acctno};
}
$result = sprintf "%05d%-18s%8d%06d%1s%08d",
($acctno, @{$values{$acctno}});
return $result;
},
} );
How does this differ from Tie::File?
Mark Jason Dominus' Tie::File module is one of my Fave 5 CPAN modules. It's excellent for modifying a file in place. But I frequently have to leave the source file unmodified and create a new file, which implies, at the very least, opening, printing to, and closing filehandles in addition to using Tie::File. List::RewriteElements hides all that. It also provides the statistical report methods.
Couldn't I do this with map
and grep
?
Quite possibly. But if your rules and criteria were complicated or long, the content of the map
and grep
{}
blocks would be hard to read. You also wouldn't get the statistical report methods.
How Does It Work?
Why do you care? Why do you want to look inside the black box? If you really want to know, read the source!
PREREQUISITES
List::RewriteElements relies only on modules distributed with the Perl core as of 5.8.0. IO::Capture::Stdout is required for the test suite, but a copy is included in the distribution under the t/ directory.
BUGS
None known at this time. File bug reports at http://rt.cpan.org.
HISTORY
0.09 Mon Jan 22 22:35:56 EST 2007 - Update version number and release date only. Purpose: generate new round of tests by cpan testers, in the hope that it eliminates a FAIL report on v0.08 where failure was due solely to error on tester's box.
0.08 Mon Jan 1 08:54:01 EST 2007 - xdg to the rescue! Applied and extended patches supplied by David Golden for Win32. In constructor, value of $/
is supplied to the recsep
option.
0.07 Sun Dec 31 11:13:04 EST 2006 - Switched to using File::Spec::catfile() to generate one path (rather than Cwd::realpath(). This was done in an attempt to respond to corion's FAIL reports (but I don't have a good Windows box, so I can't be certain of the results).
0.06 Sat Dec 16 11:31:38 EST 2006 - Created t/07_fixed_width.t and t/testlib/fixed.t to illustrate use of List::RewriteElements with fixed-width data.
0.05 Thu Dec 14 07:42:24 EST 2006 - Correction of POD formatting errors only; no change in functionality. CPAN upload.
0.04 Wed Dec 13 23:04:33 EST 2006 - More tests; fine-tuning of code and documentation. First CPAN upload.
0.03 Tue Dec 12 22:13:00 EST 2006 - Implementation of statistical methods; more tests.
0.02 Mon Dec 11 19:38:26 EST 2006 - Added tests to demonstrate use of closures to supply additional information to elements such as body_rule.
0.01 Sat Dec 9 22:29:51 2006 - original version; created by ExtUtils::ModuleMaker 0.47
ACKNOWLEDGEMENTS
Thanks to David Landgren for raising the question of use of List-RewriteElements with fixed-width data.
I then adapted an example from Dave Cross' Data Munging with Perl, Chapter 7.1, "Fixed-width Data," to provide a test demonstrating processing of fixed-width data.
AUTHOR
James E Keenan. CPAN ID: JKEENAN. jkeenan@cpan.org. http://search.cpan.org/~jkeenan/ or http://thenceforward.net/perl/modules/List-RewriteElements.
COPYRIGHT
Copyright 2006 James E Keenan (USA).
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
SEE ALSO
David Cross, Data Munging with Perl (Manning, 2001).