The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Text::Parser::Multiline - Adds multi-line support to the Text::Parser object.

VERSION

version 0.803

SYNOPSIS

    use Text::Parser;

    my $parser = Text::Parser->new(multiline_type => 'join_last');
    $parser->read('filename.txt');
    print $parser->get_records();
    print scalar($parser->get_records()), " records were read although ",
        $parser->lines_parsed(), " lines were parsed.\n";

RATIONALE

Some text formats allow users to split a single line into multiple lines, with a continuation character in the beginning or in the end, usually to improve human readability.

To handle these types of text formats with the native Text::Parser class, the derived class would need to have a save_record method that would:

  • Detect if the line is continued, and if it is, save it in a temporary location

  • Keep appending (or joining) any continued lines to this temporary location

  • Once the line continuation stops, then create a record and save the record with save_record method

It should also look for error conditions:

  • If the end of file is reached, and a joined line is still waiting incomplete, throw an exception "unexpected EOF"

  • If the first line in a text input happens to be a continuation of a previous line, that is impossible, since it is the first line ; so throw an exception

This gets further complicated by the fact that whereas some multi-line text formats have a way to indicate that the line continues after the current line (like a back-slash character at the end of the line or something), and some other text formats indicate that the current line is a continuation of the previous line. For example, in bash, Tcl, etc., the continuation character is \ (back-slash) which, if added to the end of a line of code would imply "there is more on the next line". In contrast, SPICE has a continuation character (+) on the next line, indicating that the text on that line should be joined with the previous line.

This extension allows users to use the familiar save_record interface to save records, as if all the multi-line text inputs were joined.

OVERVIEW

To create a multi-line text parser you need to know:

  • Determine if your parser is a 'join_next' type or a 'join_last' type.

  • Recognize if a line has a continuation pattern

  • How to strip the continuation character and join with last line

REQUIRED METHODS

So here are the things you need to do if you have to write a multi-line text parser:

  • As usual inherit from Text::Parser, never this class (use parent 'Text::Parser')

  • Override the new constructor to add multiline_type option by default. Read about the option here.

  • Override the is_line_continued method to detect if there is a continuation character on the line.

  • Override the join_last_line to join the previous line and the current line after stripping any continuation characters.

  • Implement your save_record as if you always get joined lines!

That's it! What's more? There are some default implementations for these methods in Text::Parser class already. But if you want to do any stripping of continuation characters etc., you'd want to override these in your own parser class.

Text::Parser->new(%options)

Decide if you want to set any options like auto_chomp by default. In order to get a multi-line parser, you must select one of multiline_type values: 'join_next' or 'join_last'.

$parser->is_line_continued($line)

Takes a string argument as input. Returns a boolean that indicates if the current line is continued from the previous line, or is continued on the next line (depending on the type of multi-line text format). You don't need to bother about how the boolean result of this routine is interpreted. That is handled depending on the type of multi-line parser. The way the result of this function is interpreted depends on the type of multi-line parser you make. If it is a 'join_next' parser, then a true value from this routine means that some data is expected to be in the next line which is expected to be joined with this line. If instead the parser is 'join_last', then a true value from this method would mean that the current line is a continuation from the previous line, and the current line should be appended to the content of the previous line.

$parser->join_last_line($last_line, $current_line)

Takes two string arguments. The first is the line previously read which is expected to be continued on this line. You can be certain that the two strings will not be undef. Your method should return a string that has stripped any continuation characters, and joined the current line with the previous line. You don't need to bother about where and how this is being saved. You also don't need to bother about where the last line is stored/coming from. The management of the last line is handled internally.

BUGS

Please report any bugs or feature requests on the bugtracker website http://github.com/balajirama/Text-Parser/issues

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR

Balaji Ramasubramanian <balajiram@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018-2019 by Balaji Ramasubramanian.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.