NAME
Text::Parser::Manual::ComparingWithNativePerl - A comparison of text parsing with native Perl and Text::Parser
VERSION
version 0.927
LIMITATIONS OF THE PERL ONE-LINER
When people compare Perl against AWK, the usual answer is this:
$ > perl -lane 'print;' file.txt
But the problem is that it isn't useful for anything more than just oneliners. Secondly, this cannot be used in a complex program. And even if you could write some one-liner code, you cannot follow good programming practices like use strict
.
The Perl one-liner is surely not a useful solution for serious programs. But if you're not convinced, we'll go through some examples here.
A SIMPLE EXAMPLE
To understand how Text::Parser compares to the native Perl way of doing things, let's take a simple example and see how we would write code. Let's say we have a simple text file (info.txt) with lines of information like this:
NAME: Brian
EMAIL: brian@webhost.net
ADDRESS: 401 Burnswick Ave, Cool City, UT 12345
NAME: Darin Cruz
ADDRESS: 209 Random St, Forest City, CA 92710
EMAIL: darin123@yahoo.co.uk
NAME: Elizabeth Andrews
ADDRESS: 0 Muutama Lane, Inaccessible Forest area, AK 88170
NAME: Audrey C. Miller
ADDRESS: 9 New St, Smart City, PA 12933
EMAIL: aud@audrey.io
You have to write code that would parse this to create a data structure with all names and corresponding email addresses.
{ name => "Brian", email => "brian@webhost.net", address => "401 Burnswick Ave, Cool City, UT 12345"},
.
.
.
Perl one-liner
Could we do this using a Perl one-liner?
perl -lane 'BEGIN {
@data = ();\
}\
if($F[0] eq "NAME:") {\
shift @F;\
push @data, {name => join(' ', @F)};\
} elsif($F[0] eq "EMAIL:") {\
$d = pop @data; $d->{email} = $F[1];\
} elsif($F[0] eq "ADDRESS:") {\
$d = pop @data;\
shift @F; \
$d->{address} = join ' ', @F;\
}' info.txt
So much for a one-liner! But you can't do anything else with this, can you?
Native Perl script
Here's an implementation in native Perl scipt:
open IN, "<info.txt";
my @data = ();
while(<IN>) {
chomp;
my (@field) = split /\s+/;
if ($field[0] eq 'NAME:') {
shift @field;
push @data, { name => join(' ', @field) };
} elsif($field[0] eq 'EMAIL:') {
$data[-1]->{email} = $field[1];
} elsif($field[0] eq 'ADDRESS:') {
shift @field;
$data[-1]->{email} = join ' ', @field;
}
}
close IN;
With Text::Parser
Here's how you'd write the same thing with Text::Parser.
use Text::Parser;
my $parser = Text::Parser->new();
$parser->add_rule( if => '$1 eq "NAME:"', do => 'return { name => ${2+} }' );
$parser->add_rule( if => '$1 eq "EMAIL:"',
do => 'my $rec = $this->pop_record; $rec->{email} = $2; return $rec' );
$parser->add_rule( if => '$1 eq "ADDRESS:"',
do => 'my $rec = $this->pop_record; $rec->{email} = ${2+}; return $rec' );
$parser->read('info.txt');
Quick observations
The programmer has to still specify how to extract data, but:
she can focus on the content rather than the mechanics of file handling
another programmer can instantly understand what is going on
the results can be used in a more complex program - not just a one-liner
parsing files has never been this intuiive, especially with shortcuts like
${2+}
Besides, did you notice the bug in the while
loop of the native Perl script above? Hint: What happens if we split
a string with leading and trailing spaces?
ANOTHER SIMPLE EXAMPLE
Take another simple example. Here we have new stuff in info.txt:
State: California
County: Santa Clara, 1304, San Jose, 2/18/1850
County: Alameda, 821, Oakland, 3/25/1853
County: San Mateo, 774, Redwood City, 4/19/1856
.
.
.
State: Arkansas
.
.
.
Let's say you have to parse this and form a data structure like this:
[
{
state => 'California',
'Santa Clara' => {area => 1304, county_seat => 'San Jose', date_inc => '2/18/1850'},
'Alameda' => {area => 821, county_seat => 'Oakland', date_inc => '3/25/1853'},
'San Mateo' => {area => 774, county_seat => 'Redwood City', date_inc => '4/19/1856'},
},
{
state => 'Arkansas',
...
}
]
Perl one-liner
It is clear that the one-liner is no longer really a one-liner. And you cannot use strict
. But go ahead and give it a try if you want.
Native Perl code
use String::Util 'trim';
open IN, "<info.txt";
my @data = ();
while(<IN>) {
chomp;
$_ = trim($_);
my (@field) = split /[:,]\s+/;
if ($field[0] eq 'State') {
push @data, { state => $field[1] };
} elsif($field[0] eq 'County') {
my $data = pop @data;
$data->{$field[1]} => {area => $field[2], county_seat => $field[3], date_inc => $field[4]};
push @data, $data;
}
}
close IN;
With Text::Parser
use Text::Parser;
my $parser = Text::Parser->new(auto_split => 1, FS => qr/[:,]\s+/);
$parser->add_rule(if => '$1 eq "State"', do => 'return {state => $2}');
$parser->add_rule(if => '$1 eq "County"',
do => 'my $data = $this->pop_record;
$data->{$2} = { area => $3, county_seat => $4, date_inc => $5, };
return $data;'
);
$parser->read('info.txt');
SOMETHING MORE FUN
Let's take something more fun. A selection of students from Riverdale High and Hogwarts took part in a quiz. This is a record of their scores.
School = Riverdale High
Grade = 1
Student number, Name
0, Phoebe
1, Rachel
Student number, Score
0, 3
1, 7
Grade = 2
Student number, Name
0, Angela
1, Tristan
2, Aurora
Student number, Score
0, 6
1, 3
2, 9
School = Hogwarts
Grade = 1
Student number, Name
0, Ginny
1, Luna
Student number, Score
0, 8
1, 7
Grade = 2
Student number, Name
0, Harry
1, Hermione
Student number, Score
0, 5
1, 10
Grade = 3
Student number, Name
0, Fred
1, George
Student number, Score
0, 0
1, 0
You want to parse this into a data structure like this:
# Entries data-structure hierarchy is:
# school/grade/student number/Name
# school/grade/student number/Score
{
"Riverdale High" => {
"1" => {
0 => {Name => "Phoebe", Score => 3},
1 => {Name => "Rachel", Score => 7}
},
"2" => {
0 => {Name => "Angela", Score => 6},
1 => {Name => "Tristan", Score => 3},
2 => {Name => "Aurora", Score => 9},
},
},
},
{
"Hogwarts" => {
"1" => {
0 => {Name => "Ginny", Score => 8},
1 => {Name => "Luna", Score => 7},
},
"2" => {
0 => {Name => "Harry", Score => 5},
1 => {Name => "Hermione", Score => 10},
},
"3" => {
0 => {Name => "Fred", Score => 0},
1 => {Name => "George", Score => 0 },
},
},
}
This problem comes from a source where the solution was implemented in Python using a PEG parser.
Native Perl
Do I have to really do this? Why don't I let you try this yourself.
With Text::Parser
use Text::Parser;
my $parser = Text::Parser->new(FS => qr/\s+\=\s+|,\s+/);
$parser->add_rule(if => '$1 eq "School"',
do => '~school = $2; return {$2 => {}};');
$parser->add_rule(if => '$1 eq "Grade"',
do => 'my $p = $this->pop_record;
$p->{~school}{$2} = {};
~grade = $2;
return $p;');
$parser->add_rule(if => '$1 eq "Student number"',
do => '~info = $2;', dont_record => 1);
$parser->add_rule(
do => 'my $p = $this->pop_record;
$p->{~school}{~grade}{$1}{~info} = $2;
return $p;'
);
$parser->read('info.txt');
That's it!
By now, you should have concluded that the Text::Parser way is much better. If not, you must know a better solution and perhaps you should make a Perl module (or feel free to contact me and contribute if you like this project).
BUGS
Please report any bugs or feature requests on the bugtracker website http://github.com/balajirama/Text-Parser/issues
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
AUTHOR
Balaji Ramasubramanian <balajiram@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2018-2019 by Balaji Ramasubramanian.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.