NAME
Parse::Marpa - (pre-Alpha) Jay Earley's general parsing algorithm, with LR(0) precomputation
VERSION
Pre-alpha Version
This is strictly a developer's version. Nothing useful will be found here, and the documentation is also inchoate. Those not developing this module will want to wait for at least a released, beta version.
I've no personal experience with them, but Parse::Yapp
and Parse::RecDescent
are alternatives to this module which are well reviewed and much more mature and stable. Until Parse::Marpa
goes at least beta, users with a deadline or targeting anything close to a production environment should look there.
SYNOPSIS
UNDER CONSTRUCTION ...
The Easy Way
It's possible to specify the grammar and the text to be parsed all in one step
use Parse::Marpa;
my $value = Parse::Marpa::marpa(\$grammar, \$text_to_parse);
and you can even include options if you make a hash ref the third argument.
my $value = Parse::Marpa::marpa(
\$grammar,
\$text_to_parse
{
warnings => 1,
...
}
);
You can get all the values of an ambiguous parse by invoking Parse::Marpa::marpa
in list context.
my @values = Parse::Marpa::marpa(\$ambiguous_grammar, \$text_to_parse);
Exercising more control
You can also proceed step by step. For example, ...
use Parse::Marpa;
my @tests = split(/\n/, <<'EO_TESTS');
time / 25 ; # / ; die "this dies!";
localtime / 25 ; # / ; die "this dies!";
EO_TESTS
First creating the grammar object ...
my $g = new Parse::Marpa(
warnings => 1,
code_lines => -1,
);
Then setting the grammar ...
my $mini_perl_grammar; { local($RS) = undef; $mini_perl_grammar = <DATA> };
$g->set( source => \$mini_perl_grammar);
And then as many times as you like ...
TEST: while (my $test = pop @tests) {
Create a parse object ...
my $parse = new Parse::Marpa::Parse($g);
Pass text to the recognizer ...
$parse->text(\$test);
Set an initial parse ...
$parse->initial();
my @parses;
push(@parses, $parse->value);
Get others, if there are any ...
while ($parse->next) {
push(@parses, $parse->value);
}
And then go on with your other processing ...
say "I've got ", scalar @parses, " parses:";
for (my $i = 0; $i < @parses; $i++) {
say "Parse $i: ", ${$parses[$i]};
}
}
__DATA__
...
DESCRIPTION
UNDER CONSTRUCTION AND VERY INCOMPLETE
Overview
Parse::Marpa
parses text given an arbitrary context-free grammar.
Grammar may be anything which can be specified in BNF.
This includes left-recursive grammars, right-recursive grammars, grammars with empty productions, grammars with cycles and ambigious grammars.
In fact, ambiguous grammars are a Marpa specialty. They are useful even when you never want more than one parse. Human languages have an ambiguous grammars, from which the listener pulls the parse that makes most sense. An ambiguous grammar is often the easiest and most sensible way to express a language. Marpa allows the user to prioritize rules so that the preferred parse comes up first.
Of course, you can also have
Marpa
return all the parses of an ambiguous grammar.Marpa allows predictive lexing and uses LR(0) precomputation. It incorporates the latest academic research for speeding up Earley's algorithm, along with further innovations.
Parse::Marpa::marpa()
: The Easy Way function
- Parse::Marpa::marpa(\$grammar, \$text_to_parse, option hash);
-
The
marpa
function takes three arguments: a ref to a string containing a Marpa source description of the grammar; a ref to a string with the text to be parsed; and (optionally) a ref to a hash with options. If there are no parses,marpa()
returns failure. If there are parses, andmarpa()
was called in list context, it returns a list of references to the values of all the parses. If there are parses, andmarpa
was called in scalar context, it returns a reference to the value of the first parse.
Functions and Methods for More Control
- new Parse::Marpa(option => value, option => value, ...)
-
The
new
method takes a series of arguments which are treated as a hash with options as keys and the option values as the hash values.new
either throws an exception withcroak
or returns a new grammar object. Valid options are:- source
-
This takes as its value a reference to a string containing a description of the grammar in the Marpa grammar description language.
- start
-
[ Need to figure out whether start symbol from source file should be allowed to be overwritten. Issues of canonical and internal name form, etc. ]
- default_null_value
- default_action
- default_lex_prefix
- version
- semantics
-
These predefineds may be specified as options. If a Marpa predefined is specified both in a source file and as an option, the option overrides the value in the source file. For descriptions of these options, see the documentation for the Marpa grammar description language.
- ambiguous_lex
- volatile
-
These are parse and evaluation time options. Setting them in the grammar makes that setting the default for all parses generated from that grammar.
- warnings
-
This is a boolean which enables warnings about inaccessible and unproductive rules in the grammar. Warnings are written to the trace file handle. By default, warnings are on.
Inaccessible rules can those which can never be reached from the start symbol. Unproductive rules are those which no possible input can ever generate. Marpa simply ignores these and generates a parser from the remaining rules.
Often the presence of inaccessible and unproductive rules indicate problems in the grammar. But a user sometimes want to leave them while the grammar is under construction or use them as notes.
- code_lines
-
If there is a problem with user supplied code, Marpa prints the error message and a description of where the code is being used. Marpa will display the code itself as well. The value of this option tells Marpa how many lines to print before truncating the code. If it's zero, no code is displayed. If it's negative, all the code is displayed, no matter how long it is. The default is 30 lines.
- preamble
-
Another predefined which may also be specified as an option. Note that while multiple preambles may be specified in a source file, and they are concatenated, if the preamble option is used it overrides, replacing all the preamble code assembled so far. For a description of preambles, see the documentation for the Marpa grammar description language.
- new Parse::Marpa::Parse(option => value, [option => value, ...])
-
Parse::Marpa::Parse::new
takes as its arguments a series of option name, option value pairs which are handled as a hash. It returns a new parse object or throws an exception usingcroak
.One of the options must be the
grammar
option, and it must have as its value a grammar object with rules in it. Any optionsParse::Marpa::Parse::new
does not directly handle are treated as options for this grammar. See their description in the description the grammar constructor,Parse::Marpa::new
, above. The follows are the options directly handled by the parser constructor.- grammar
-
Takes as its value a grammar object.
- default_action
- ambiguous_lex
- volatile
- preamble
- stream
-
A boolean. If its value is true, the parser runs in "streaming" mode. Usually, Marpa assumes the input has ended when the first parse is requested, does some final bookkeeping in the Earley sets, refuses to accept any more input, and handles parse requests on the default assumption that the parses desired are of the entire input.
In stream mode, which is still somewhat under construction and poorly tested, new tokens may be added at any point and final bookkeeping is never done. Marpa's default idea is still to parse the entire input up to that point, but without the final bookkeeping such parses will very likely fail. The user has to determine the right places to look for complete parses, based on her knowledge of the structure of the grammar and the input with help from routines supplied for this purpose with Marpa.
- Parse::Marpa::Parse::text(parse, text)
-
Extends the parse in the parse object with the input text, a reference to a string. Returns -1 if the parse succeeded. If the parse was exhausted in the course of processing the input string (what in most contexts is called a parse "failure"), the location where the parse failed is returned. Note that if the parse was exhausted by the first character of input, that location is zero. Other failures are reported via exceptions thrown using
croak
.
Less used methods
Some of these methods explicitly put the grammar through processing phases which Marpa typically does implicitly, as necessary. For example, when a new parse object is created from a grammar which has not been throught the precomputation phase, that grammar is automatically precomputed, and then deep copied with a compile and a decompile.
A user who wants to trace Marpa's behavior during a processing phase may find it advantageous to run that phase explicitly. Similarly, a user who wants to run diagnostics on the results of a processing phase may wish to have that processing phase run as the result of an explicit method call.
- Parse::Marpa::compile(grammar) or $grammar->compile()
-
The compile method takes as its single argument a grammar object, and "compiles" it, that is, writes it out using Data::Dumper. It returns a reference to the compiled grammar, or throws an exception using
croak
.compile()
is usually called indirectly by Marpa when a new parse object is created. - Parse::Marpa::decompile(compiled_grammar, [trace_file_handle])
-
The decompile static method takes a reference to a compiled grammar as its first argument. A second, optional, argument is a file handle both to be used to override the compiled grammar's trace file handle, and for any trace messages compile from
decompile()
itself.decompile()
returns the decompiled grammar object unless it throws an exception usingcroak
.If
decompile()
doesn't get the trace file handle explicitly as an argument, the new grammar's trace file handle will revert to the default STDERR. The trace file handle argument is needed because in the course of compilation, the grammar's original trace file handle may have been lost. For instance, if the compiled grammar is written to a file by one process, then read into a second one for decompilation, and the original trace file handle was not STDOUT or STDERR, it's very unlikely to be available in the second process.When Marpa compiles and decompiles and grammar implicitly, in order to deep copy it, it save the trace file handle and restores it using the trace file handle argument to
decompile()
. - Parse::Marpa::precompute(grammar) or $grammar->precompute()
-
Takes as its only argument a grammar object and performs the precomputation phase on it. It returns the grammar object or throws an exception using
croak
.precompute()
is called internally by Marpa when a new parse object is created, and usually the user does not need to call it directly. Cases where the user might find it helpful or necessary to runprecompute()
directly include running traces on the precomputation; performing diagnostics on the grammar in its precomputed state; or when the user wants to compiling the grammar in his own custom code, instead of using themarpa
utility'scompile
command for that purpose.
Marpa Processing Phases
Whatever the virtues of the Marpa parser, it is certainly not single-pass. Phases visible to the user to one extent or another include.
Error processing
A few of the Marpa functions and methods have error returns, as explained in the descriptions of those functions. Most often Marpa reports a problem by throwing an exception using croak()
. You can catch these using eval
, if you don't want them to be fatal.
Notes not yet properly incorporated in this document
Point out that lexing honors the pos setting and must not alter it, even on successful match. Warn user of counter-intuitive behavior.
Warn user that in the lex patterns all named captures beginning with any "MARPA_" in any capitalization variant ("marpa_", "MaRpA_", etc.) is reserved.
All Perl code supplied by the user via the Marpa source file by default is run with "use integer" in effect. If the user wants floating point arithmetic she must specify "no integer".
AUTHOR
Jeffrey Kegler
DEPENDENCIES
Requires Perl 5.10. Users who want or need to stick with Perl 5.8 or earlier probably are also best off with more mature alternatives to Marpa
.
IMPLEMENTATION NOTES
It's often and correctly said by those experienced in Perl that passing string refs instead of strings is a pointless or even counter-productive optimization and I agree. I believe that Marpa
is an exception and have designed the interface accordingly. The strings involved here can be very long and copying and recopying them a major waste of time.
My use of object orientation in Marpa is superficial. Only grammars and parses are objects, and they are not designed to be inherited.
BUGS
Please report any bugs or feature requests to bug-parse-marpa at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Parse-Marpa. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Parse::Marpa
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
RT: CPAN's request tracker
Search CPAN
ACKNOWLEDGMENTS
Marpa is the parser described in John Aycock and R. Nigel Horspool's "Practical Earley Parsing", The Computer Journal, Vol. 45, No. 6, 2002, pp. 620-630. I've made significant changes to it, which are documented separately (Parse::Marpa::ALGORITHM). Aycock and Horspool, for their part, built on the algorithm discovered by Jay Earley, and described in his "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970.
In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp
), Damian Conway (Parse::RecDescent
) and Graham Barr (Scalar::Util
).
COPYRIGHT & LICENSE
Copyright 2007 Jeffrey Kegler, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.