NAME
Text::Parser::Multiline - Adds multi-line support to the Text::Parser object.
VERSION
version 0.801
SYNOPSIS
use Text::Parser;
my $parser = Text::Parser->new(multiline_type => 'join_last');
$parser->read('filename.txt');
print $parser->get_records();
print scalar($parser->get_records()), " records were read although ",
$parser->lines_parsed(), " lines were parsed.\n";
RATIONALE
Some text formats allow users to split a single line into multiple lines, with a continuation character in the beginning or in the end, usually to improve human readability.
To handle these types of text formats with the native Text::Parser class, the derived class would need to have a save_record
method that would:
Detect if the line is continued, and if it is, save it in a temporary location
Keep appending (or joining) any continued lines to this temporary location
Once the line continuation stops, then create a record and save the record with
save_record
method
It should also look for error conditions:
If the end of file is reached, and a joined line is still waiting incomplete, throw an exception "unexpected EOF"
If the first line in a text input happens to be a continuation of a previous line, that is impossible, since it is the first line ; so throw an exception
This gets further complicated by the fact that whereas some multi-line text formats have a way to indicate that the line continues after the current line (like a back-slash character at the end of the line or something), and some other text formats indicate that the current line is a continuation of the previous line. For example, in bash, Tcl, etc., the continuation character is \
(back-slash) which, if added to the end of a line of code would imply "there is more on the next line". In contrast, SPICE has a continuation character (+
) on the next line, indicating that the text on that line should be joined with the previous line.
This extension allows users to use the familiar save_record
interface to save records, as if all the multi-line text inputs were joined.
OVERVIEW
To create a multi-line text parser you need to know:
Determine if your parser is a
'join_next'
type or a'join_last'
type.Recognize if a line has a continuation pattern
How to strip the continuation character and join with last line
REQUIRED METHODS
So here are the things you need to do if you have to write a multi-line text parser:
As usual inherit from Text::Parser, never this class (
use parent 'Text::Parser'
)Override the
new
constructor to addmultiline_type
option by default. Read about the option here.Override the
is_line_continued
method to detect if there is a continuation character on the line.Override the
join_last_line
to join the previous line and the current line after stripping any continuation characters.Implement your
save_record
as if you always get joined lines!
That's it! What's more? There are some default implementations for these methods in Text::Parser class already. But if you want to do any stripping of continuation characters etc., you'd want to override these in your own parser class.
Text::Parser->new(%options)
Decide if you want to set any options like auto_chomp
by default. In order to get a multi-line parser, you must select one of multiline_type
values: 'join_next'
or 'join_last'
.
$parser->is_line_continued($line)
Takes a string argument as input. Returns a boolean that indicates if the current line is continued from the previous line, or is continued on the next line (depending on the type of multi-line text format). You don't need to bother about how the boolean result of this routine is interpreted. That is handled depending on the type of multi-line parser. The way the result of this function is interpreted depends on the type of multi-line parser you make. If it is a 'join_next'
parser, then a true value from this routine means that some data is expected to be in the next line which is expected to be joined with this line. If instead the parser is 'join_last'
, then a true value from this method would mean that the current line is a continuation from the previous line, and the current line should be appended to the content of the previous line.
$parser->join_last_line($last_line, $current_line)
Takes two string arguments. The first is the line previously read which is expected to be continued on this line. You can be certain that the two strings will not be undef
. Your method should return a string that has stripped any continuation characters, and joined the current line with the previous line. You don't need to bother about where and how this is being saved. You also don't need to bother about where the last line is stored/coming from. The management of the last line is handled internally.
BUGS
Please report any bugs or feature requests on the bugtracker website http://github.com/me/Text-Parser/issues
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
AUTHOR
Balaji Ramasubramanian <balajiram@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2018 by Balaji Ramasubramanian.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.