NAME

Parse::Indented - Given a Pythonesque set of indented lines, parses them into a convenient hierarchical structure

VERSION

Version 0.02

SYNOPSIS

I have a bad habit of writing pseudocode when thinking of data structures. Since I learned Python, it's only gotten worse. So every time I start a new research project, I end up scratching out some pseudocode specifications for the various data or semantics or what have you, and then I bog down in writing yet another incomplete, buggy parser. This module represents my first try at setting down that incomplete, buggy parser where I can find it, so maybe next time I'll start from something other than scratch, and end up with a less incomplete and less buggy parser.

Because I'm lazy, the output of this parser is an XML::xmlapi structure, because that API is embedded in my brainstem at this point.

This parser does absolutely nothing with the actual lines themselves, but you can give it a function to call on each line to parse it and splice it into the final result. Parse::RecDescent::Simple is a good choice (not that I'm partial or anything).

use Parse::Indented;

my $parser = Parse::Indented->new(sub {$_[0]});   # Just pass the line through as content for a simple parse.
my $obj = $parser->parse (q{ });

SUBROUTINES/METHODS

new($line_parser)

Sets up a parser in advance with a line parser.

parse ($text, $line_parser, $root)

Call this with some indented text to parse. If you set up a parser in advance, you don't need to pass a line parser; if you don't want to mess with that, though, you can call this as a class method and give it the line parser on the spot.

If a "root" node is given, the parsed objects will be added to that root, and the return value is a list of the objects added (in list context); if no root node is supplied, one will be created (with a "root" tag) and the return value is that object.

About the input: Parse::Indented ignores blank lines and any comments from # to the end of the line. It trims leading and trailing space before asking the line parser to parse the line for it. The line parser is passed a string and is expected to return an XML::xmlapi (or $baseclass) structure; if it doesn't (that is, if it returns the same string you gave it, or a different string) Parse::Indented will turn it into a <line> tag with the text of the line in a "text" attribute, e.g. <line text="this is the line you passed"/>.

Children of each line are appended prettily to the parsed tag, meaning that if the data is dumped in XML format ($object->string()), each tag will be on a separate line instead of all being run together. If the parsed tag already has some structure, this behavior means you will need to be a little careful with your semantics to avoid confusion. Wise choices are left to the user; Parse::Indented will gladly shoot you in the foot if you tell it to.

If the line ends in a curly bracket {, everything until the next appropriately un-indented close curly bracket } will be sucked into a body element attached to the line that started the curly bracket.

The line parser can return a list of the parsed line structure plus a flag "wants_sublines". If that flag is set, then all indented lines under the current line until the next un-indented line will be considered bracketed, and placed into the body of the line that started that mode.

For example:

code something {
   indented code
   here
}

This will create a line structure for "code something", with a body element under it containing "indented code\nhere\n". Note, however, that the initial indentations for the indented code will be removed, as it's assumed you want the code to be unindented. Indentation within the block is preserved, so if you have one line indented six spaces and one under it indented nine, you'll end up with the first line not indented and the second line indented by three.

If the line parser reports that the structure for "code something" wants sublines, this would be equivalent:

code something
   indented code
   here
   
more stuff later

See? That would return line structures for "code something" and "more stuff later", with the same body for "code something" as in the previous example. This is here for two reasons: first, my goal in this is to be able to type as few keystrokes as possible to express myself. Second, this allows code to look like the rest of the structure; it's an esthetic decision.

AUTHOR

Michael Roberts, <michael at vivtek.com>

BUGS

Please report any bugs or feature requests to bug-parse-indented at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Parse-Indented. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Parse::Indented

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2010 Michael Roberts.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.