NAME

Parse::Marpa - (pre-Alpha) Jay Earley's general parsing algorithm, with LR(0) precomputation

VERSION

Pre-alpha Version

This is strictly a developer's version. Nothing useful will be found here, and the documentation is also inchoate. Those not developing this module will want to wait for at least a released, beta version.

I've no personal experience with them, but Parse::Yapp and Parse::RecDescent are alternatives to this module which are well reviewed and much more mature and stable. Until Parse::Marpa goes at least beta, users with a deadline or targeting anything close to a production environment should look there.

SYNOPSIS

UNDER CONSTRUCTION ...

The Easy Way

It's possible to specify the grammar and the text to be parsed all in one step

use Parse::Marpa;
my $value = Parse::Marpa::marpa(\$grammar, \$text_to_parse);

and you can even include options if you make a hash ref the third argument.

my $value = Parse::Marpa::marpa(
    \$grammar,
    \$text_to_parse
    {
        warnings => 1,
        ...
    }
);

You can get all the values of an ambiguous parse by invoking Parse::Marpa::marpa in list context.

my @values = Parse::Marpa::marpa(\$ambiguous_grammar, \$text_to_parse);

Exercising more control

You can also proceed step by step. For example, ...

use Parse::Marpa;

my @tests = split(/\n/, <<'EO_TESTS');
time  / 25 ; # / ; die "this dies!";
localtime  / 25 ; # / ; die "this dies!";
EO_TESTS

First creating the grammar object ...

my $g = new Parse::Marpa(
    warnings => 1,
    code_lines => -1,
);

Then setting the grammar ...

my $mini_perl_grammar; { local($RS) = undef; $mini_perl_grammar = <DATA> };
$g->set( source => \$mini_perl_grammar);

And then as many times as you like ...

TEST: while (my $test = pop @tests) {

Create a parse object ...

my $parse = new Parse::Marpa::Parse($g);

Pass text to the recognizer ...

$parse->text(\$test);

Set an initial parse ...

$parse->initial();
my @parses;
push(@parses, $parse->value);

Get others, if there are any ...

while ($parse->next) {
    push(@parses, $parse->value);
}

And then go on with your other processing ...

    say "I've got ", scalar @parses, " parses:";
    for (my $i = 0; $i < @parses; $i++) {
        say "Parse $i: ", ${$parses[$i]};
    }
}

__DATA__
...

DESCRIPTION

UNDER CONSTRUCTION AND VERY INCOMPLETE

Overview

Parse::Marpa parses text given an arbitrary context-free grammar.

  • Grammar may be anything which can be specified in BNF.

  • This includes left-recursive grammars, right-recursive grammars, grammars with empty productions, grammars with cycles and ambigious grammars.

  • In fact, ambiguous grammars are a Marpa specialty. They are useful even when you never want more than one parse. Human languages have an ambiguous grammars, from which the listener pulls the parse that makes most sense. An ambiguous grammar is often the easiest and most sensible way to express a language. Marpa allows the user to prioritize rules so that the preferred parse comes up first.

  • Of course, you can also have Marpa return all the parses of an ambiguous grammar.

  • Marpa allows predictive lexing and uses LR(0) precomputation. It incorporates the latest academic research for speeding up Earley's algorithm, along with further innovations.

Parse::Marpa::marpa(): The Easy Way function

Parse::Marpa::marpa(\$grammar, \$text_to_parse, option hash);

The marpa function takes three arguments: a ref to a string containing a Marpa source description of the grammar; a ref to a string with the text to be parsed; and (optionally) a ref to a hash with options. If there are no parses, marpa() returns failure. If there are parses, and marpa() was called in list context, it returns a list of references to the values of all the parses. If there are parses, and marpa was called in scalar context, it returns a reference to the value of the first parse.

Functions and Methods for More Control

new Parse::Marpa(option => value, option => value, ...)

The new method takes a series of arguments which are treated as a hash with options as keys and the option values as the hash values. new either throws an exception with croak or returns a new grammar object. Valid options are:

source

This takes as its value a reference to a string containing a description of the grammar in the Marpa grammar description language.

start

[ Need to figure out whether start symbol from source file should be allowed to be overwritten. Issues of canonical and internal name form, etc. ]

default_null_value
default_action
default_lex_prefix
version
semantics

These predefineds may be specified as options. If a Marpa predefined is specified both in a source file and as an option, the option overrides the value in the source file. For descriptions of these options, see the documentation for the Marpa grammar description language.

ambiguous_lex
volatile

These are parse and evaluation time options. Setting them in the grammar makes that setting the default for all parses generated from that grammar.

warnings

This is a boolean which enables warnings about inaccessible and unproductive rules in the grammar. Warnings are written to the trace file handle. By default, warnings are on.

Inaccessible rules can those which can never be reached from the start symbol. Unproductive rules are those which no possible input can ever generate. Marpa simply ignores these and generates a parser from the remaining rules.

Often the presence of inaccessible and unproductive rules indicate problems in the grammar. But a user sometimes want to leave them while the grammar is under construction or use them as notes.

code_lines

If there is a problem with user supplied code, Marpa prints the error message and a description of where the code is being used. Marpa will display the code itself as well. The value of this option tells Marpa how many lines to print before truncating the code. If it's zero, no code is displayed. If it's negative, all the code is displayed, no matter how long it is. The default is 30 lines.

preamble

Another predefined which may also be specified as an option. Note that while multiple preambles may be specified in a source file, and they are concatenated, if the preamble option is used it overrides, replacing all the preamble code assembled so far. For a description of preambles, see the documentation for the Marpa grammar description language.

new Parse::Marpa::Parse(option => value, [option => value, ...])

Parse::Marpa::Parse::new takes as its arguments a series of option name, option value pairs which are handled as a hash. It returns a new parse object or throws an exception using croak.

One of the options must be the grammar option, and it must have as its value a grammar object with rules in it. Any options Parse::Marpa::Parse::new does not directly handle are treated as options for this grammar. See their description in the description the grammar constructor, Parse::Marpa::new, above. The follows are the options directly handled by the parser constructor.

grammar

Takes as its value a grammar object.

default_action
ambiguous_lex
volatile
preamble
stream

A boolean. If its value is true, the parser runs in "streaming" mode. Usually, Marpa assumes the input has ended when the first parse is requested, does some final bookkeeping in the Earley sets, refuses to accept any more input, and handles parse requests on the default assumption that the parses desired are of the entire input.

In stream mode, which is still somewhat under construction and poorly tested, new tokens may be added at any point and final bookkeeping is never done. Marpa's default idea is still to parse the entire input up to that point, but without the final bookkeeping such parses will very likely fail. The user has to determine the right places to look for complete parses, based on her knowledge of the structure of the grammar and the input with help from routines supplied for this purpose with Marpa.

Parse::Marpa::Parse::text(parse, text)

Extends the parse in the parse object with the input text, a reference to a string. Returns -1 if the parse succeeded. If the parse was exhausted in the course of processing the input string (what in most contexts is called a parse "failure"), the location where the parse failed is returned. Note that if the parse was exhausted by the first character of input, that location is zero. Other failures are reported via exceptions thrown using croak.

Less used methods

Some of these methods explicitly put the grammar through processing phases which Marpa typically does implicitly, as necessary. For example, when a new parse object is created from a grammar which has not been throught the precomputation phase, that grammar is automatically precomputed, and then deep copied with a compile and a decompile.

A user who wants to trace Marpa's behavior during a processing phase may find it advantageous to run that phase explicitly. Similarly, a user who wants to run diagnostics on the results of a processing phase may wish to have that processing phase run as the result of an explicit method call.

Parse::Marpa::compile(grammar) or $grammar->compile()

The compile method takes as its single argument a grammar object, and "compiles" it, that is, writes it out using Data::Dumper. It returns a reference to the compiled grammar, or throws an exception using croak.

compile() is usually called indirectly by Marpa when a new parse object is created.

Parse::Marpa::decompile(compiled_grammar, [trace_file_handle])

The decompile static method takes a reference to a compiled grammar as its first argument. A second, optional, argument is a file handle both to be used to override the compiled grammar's trace file handle, and for any trace messages compile from decompile() itself. decompile() returns the decompiled grammar object unless it throws an exception using croak.

If decompile() doesn't get the trace file handle explicitly as an argument, the new grammar's trace file handle will revert to the default STDERR. The trace file handle argument is needed because in the course of compilation, the grammar's original trace file handle may have been lost. For instance, if the compiled grammar is written to a file by one process, then read into a second one for decompilation, and the original trace file handle was not STDOUT or STDERR, it's very unlikely to be available in the second process.

When Marpa compiles and decompiles and grammar implicitly, in order to deep copy it, it save the trace file handle and restores it using the trace file handle argument to decompile().

Parse::Marpa::precompute(grammar) or $grammar->precompute()

Takes as its only argument a grammar object and performs the precomputation phase on it. It returns the grammar object or throws an exception using croak.

precompute() is called internally by Marpa when a new parse object is created, and usually the user does not need to call it directly. Cases where the user might find it helpful or necessary to run precompute() directly include running traces on the precomputation; performing diagnostics on the grammar in its precomputed state; or when the user wants to compiling the grammar in his own custom code, instead of using the marpa utility's compile command for that purpose.

Marpa Processing Phases

Whatever the virtues of the Marpa parser, it is certainly not single-pass. Phases visible to the user to one extent or another include.

Error processing

A few of the Marpa functions and methods have error returns, as explained in the descriptions of those functions. Most often Marpa reports a problem by throwing an exception using croak(). You can catch these using eval, if you don't want them to be fatal.

Notes not yet properly incorporated in this document

Point out that lexing honors the pos setting and must not alter it, even on successful match. Warn user of counter-intuitive behavior.

Warn user that in the lex patterns all named captures beginning with any "MARPA_" in any capitalization variant ("marpa_", "MaRpA_", etc.) is reserved.

All Perl code supplied by the user via the Marpa source file by default is run with "use integer" in effect. If the user wants floating point arithmetic she must specify "no integer".

AUTHOR

Jeffrey Kegler

DEPENDENCIES

Requires Perl 5.10. Users who want or need to stick with Perl 5.8 or earlier probably are also best off with more mature alternatives to Marpa.

IMPLEMENTATION NOTES

It's often and correctly said by those experienced in Perl that passing string refs instead of strings is a pointless or even counter-productive optimization and I agree. I believe that Marpa is an exception and have designed the interface accordingly. The strings involved here can be very long and copying and recopying them a major waste of time.

My use of object orientation in Marpa is superficial. Only grammars and parses are objects, and they are not designed to be inherited.

BUGS

Please report any bugs or feature requests to bug-parse-marpa at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Parse-Marpa. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Parse::Marpa

You can also look for information at:

ACKNOWLEDGMENTS

Marpa is the parser described in John Aycock and R. Nigel Horspool's "Practical Earley Parsing", The Computer Journal, Vol. 45, No. 6, 2002, pp. 620-630. I've made significant changes to it, which are documented separately (Parse::Marpa::ALGORITHM). Aycock and Horspool, for their part, built on the algorithm discovered by Jay Earley, and described in his "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970.

In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp), Damian Conway (Parse::RecDescent) and Graham Barr (Scalar::Util).

COPYRIGHT & LICENSE

Copyright 2007 Jeffrey Kegler, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.