NAME

Text::Parser::Multiline - Adds multi-line support to the Text::Parser object.

VERSION

version 0.800

SYNOPSIS

RATIONALE

Some text formats allow users to split a single line into multiple lines, with a continuation character in the beginning or in the end, usually to improve human readability.

To handle these types of text formats with the native Text::Parser class, the derived class would need to have a save_record method that would:

  • Detect if the line is continued, and if it is, save it in a temporary location

  • Keep appending (or joining) any continued lines to this temporary location

  • Once the line continuation stops, then create a record and save the record with save_record method

It should also look for error conditions:

  • If the end of file is reached, and a joined line is still waiting incomplete, throw an exception "unexpected EOF"

  • If the first line in a text input happens to be a continuation of a previous line, that is impossible, since it is the first line ; so throw an exception

This gets further complicated by the fact that whereas some multi-line text formats have a way to indicate that the line continues after the current line (like a back-slash character at the end of the line or something), and some other text formats indicate that the current line is a continuation of the previous line. For example, in bash, Tcl, etc., the continuation character is \ (back-slash) which, if added to the end of a line of code would imply "there is more on the next line". In contrast, SPICE has a continuation character (+) on the next line, indicating that the text on that line should be joined with the previous line.

This extension allows users to use the familiar save_record interface to save records, as if all the multi-line text inputs were joined.

OVERVIEW

To create a multi-line text parser you need to know:

  • Determine if your parser is a 'join_next' type or a 'join_last' type. This depends on which line has the continuation character.

  • Recognize if a line has a continuation pattern

  • How to strip the continuation character and join with last line

So here are the things you need to do if you have to write a multi-line text parser:

  • As usual inherit from Text::Parser, never this class (use parent 'Text::Parser')

  • Override the new constructor to add multiline_type option by default. Read about the option here.

  • Override the is_line_continued method to detect if there is a continuation character on the line.

  • Override the join_last_line to join the previous line and the current line after stripping any continuation characters.

  • Implement your save_record as if you always get joined lines, and

REQUIRED METHODS

The following methods are required to compose this role into an object or a class. There are some default implementations for both these methods, but for most practical purposes you'd want to override those in your own parser class.

$self->is_line_continued($line)

Takes a string argument as input. Returns a boolean that indicates if the current line is continued from the previous line, or is continued on the next line (depending on the type of multi-line text format).

$self->join_last_line($last_line, $current_line)

Takes two string arguments. The first is the line previously read which is expected to be continued on this line. The function should return a string that has stripped any continuation characters, and joined the current line with the previous line.

BUGS

Please report any bugs or feature requests on the bugtracker website http://rt.cpan.org/Public/Dist/Display.html?Name=Text-Parser or by email to bug-text-parser at rt.cpan.org.

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR

Balaji Ramasubramanian <balajiram@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by Balaji Ramasubramanian.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.