The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

TextFileParser - an extensible Perl class to parse any text file by specifying grammar in derived classes.

VERSION

version 0.202

SYNOPSIS

    use TextFileParser;

    my $parser = new TextFileParser;
    $parser->read(shift @ARGV);
    print $parser->get_records, "\n";

The above code reads a text file and prints the content to STDOUT.

Here's another parser which is derived from TextFileParser as the base class. See how simple it is to make your own parser.

    package CSVParser;
    use parent 'TextFileParser';

    sub save_record {
        my ($self, $line) = @_;
        chomp $line;
        my (@fields) = split /,/, $line;
        $self->SUPER::save_record(\@fields);
    }

That's it! Every line will be saved as an array reference containing the elements. Now in main:: you can write the following.

    use CSVParser;
    
    my $a_parser = new CSVParser;
    $a_parser->read(shift @ARGV);

The call to read method calls the save_record method internally. The overridden save_record method from CSVParser package is automatically called.

DESCRIPTION

This class can be used to parse any arbitrary text file format. TextFileParser does all operations like open file, close file, and line-count. Future versions are expected to include progress-bar support. All these can be re-used in parsing any other text file format. Thus derived classes of TextFileParser will be able to take advantage of these features without having to re-write the code again.

Any drived class of TextFileParser simply needs to override one single method : save_record. In this way, any format of text file can be parsed without having to re-write code that is already included in this class.

METHODS

new

Takes no arguments. Returns a blessed reference of the object.

    my $pars = new TextFileParser;

This $pars variable will be used in examples below.

read

Takes zero or one string argument containing the name of the file. Throws an exception if filename provided is either non-existent or cannot be read for any reason.

    $pars->read($filename);

    # The above is equivalent to the following
    $pars->filename($anotherfile);
    $pars->read();

Returns once all records have been read or if an exception is thrown for any parsing errors. This function will handle all open and close operations on all files even if any exception is thrown.

Recommendation: Don't override this subroutine. Override save_record instead.

filename

Takes zero or one string argument containing the name of a file. Returns the name of the file that was last opened if any. Returns undef if no file has been opened.

    print "Last read ", $pars->filename, "\n";

lines_parsed

Takes no arguments. Returns the number of lines last parsed.

    print $pars->lines_parsed, " lines were parsed\n";

This is also very useful for error message generation. See example under Synopsis.

save_record

Takes exactly one argument which can be anything: SCALAR, or ARRAYREF, or HASHREF or anything else meaningful. This method is automatically called by read method for each line, which in the TextFileParser class is simply saving string records of each line.

This method can be overridden in derived classes. An overriding method definition might call SUPER::save_record passing it a modified record. Here's an example of a parser that reads multi-line files: if a line starts with a '+' character then it is to be treated as a continuation of the previous line.

    package MultilineParser;
    use parent 'TextFileParser';

    sub save_record {
        my ($self, $line) = @_;
        return $self->SUPER::save_record($line) if $line !~ /^[+]\s*/;
        $line =~ s/^[+]\s*//;
        my $last_rec = $self->pop_record;
        chomp $last_rec;
        $self->SUPER::save_record( $last_rec . ' ' . $line );
    }

get_records

Takes no arguments. Returns an array containing all the records that were read by the parser.

    foreach my $record ( $pars->get_records ) {
        $i++;
        print "Record: $i: ", $record, "\n";
    }

last_record

Takes no arguments and returns the last saved record. Leaves the saved records untouched.

    my $last_rec = $pars->last_record;

pop_record

Takes no arguments and pops the last saved record.

    my $last_rec = $pars->pop_record;
    $uc_last = uc $last_rec;
    $pars->save_record($uc_last);

AUTHOR

Balaji Ramasubramanian <balajiram@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by Balaji Ramasubramanian.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website http://rt.cpan.org/Public/Dist/Display.html?Name=TextFileParser or by email to bug-textfileparser at rt.cpan.org.

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.