NAME

Parse::Marpa - (pre-Alpha) Jay Earley's general parsing algorithm, with LR(0) precomputation

BEWARE: THIS DOCUMENT IS UNDER CONSTRUCTION AND VERY INCOMPLETE

VERSION

This is Pre-alpha software.

It's strictly a developer's version. Nothing useful will be found here, and the documentation is also inchoate. Those not developing this module will want to wait for at least a released, beta version.

SYNOPSIS

The Easy Way

It's possible to specify the grammar and the text to be parsed all in one step

use Parse::Marpa;
my $value = Parse::Marpa::marpa(\$grammar, \$text_to_parse);

and you can even include options if you make a hash ref the third argument.

my $value = Parse::Marpa::marpa(
    \$grammar,
    \$text_to_parse
    {
        warnings => 1,
    }
);

You can get all the values of an ambiguous parse by invoking Parse::Marpa::marpa in list context.

my @values = Parse::Marpa::marpa(\$ambiguous_grammar, \$text_to_parse);

Step by Step

First, set things up ...

use Parse::Marpa;

my @tests = split(/\n/, <<'EO_TESTS');
time  / 25 ; # / ; die "this dies!";
localtime  / 25 ; # / ; die "this dies!";
EO_TESTS

then create a grammar object, ...

my $g = new Parse::Marpa(
    warnings => 1,
    code_lines => -1,
);

and set the grammar.

my $mock_perl_grammar; { local($RS) = undef; $mock_perl_grammar = <DATA> };
$g->set( source => \$mock_perl_grammar);

Next, as many times as you like, ...

TEST: while (my $test = pop @tests) {

create a parse object, ...

my $parse = new Parse::Marpa::Parse($g);

pass text to the recognizer, ...

$parse->text(\$test);

evaluate an initial parse, ...

$parse->initial();
my @parses;
push(@parses, $parse->value);

and get others, if there are any.

while ($parse->next) {
    push(@parses, $parse->value);
}

Now you can announce your results ...

    say "I've got ", scalar @parses, " parses:";
    for (my $i = 0; $i < @parses; $i++) {
        say "Parse $i: ", ${$parses[$i]};
    }
}

__DATA__
...

DESCRIPTION

Parse::Marpa parses text given an arbitrary context-free grammar.

  • The grammar may be anything which can be specified in BNF.

  • This includes left-recursive grammars, right-recursive grammars, grammars with empty productions, grammars with cycles and ambigious grammars.

  • Ambiguous grammars are a Marpa specialty. They are useful even if you only want one parse. Human languages have ambiguous grammars, from which the listener pulls the parse that makes most sense. An ambiguous grammar is often the easiest and most sensible way to express a language. Marpa allows the user to prioritize rules so that the preferred parse comes up first.

  • Marpa can also return all the parses of an ambiguous grammar.

  • Marpa incorporates the latest academic research on Earley's algorithm, combining it with LR(0) precomputation.

  • Marpa adds its own innovations, such as the combination of Earley's with predictive lexing. and ambiguous lexing.

BEWARE! PRE-ALPHA SOFTWARE

Since this is pre-alpha software, users with immediate needs must look elsewhere. I've no personal experience with them, but Parse::Yapp and Parse::RecDescent are alternatives to this module which are well reviewed and much more mature and stable.

What to expect once Marpa goes alpha

The alpha version will be intended to let people look Marpa over and even try it out. Uses beyond that are risky.

There will be bugs and misfeatures when I go alpha, but no known show-stoppers and no bugs without workarounds. The documentation follows the industry convention of telling the user how Marpa should work. If there's a difference between that and how Marpa actually works currently, it's in the Bugs section, which you'll want to at least skim before using Marpa.

While Marpa is in alpha, you may not want to automatically upgrade as new versions come out. Versions will often be incompatible. MDL emphasizes this by requiring the version option, and insisting on an exact match with Marpa's version number. That's a hassle, but so is alpha software and that's the point. The version number regime will become less harsh before Marpa leaves beta.

Obviously, while Marpa is in alpha, you won't want to use it for anything with a serious deadline or mission-critical.

READING THESE DOCUMENTS

The Parse::Marpa::CONCEPTS should be read before using Marpa, in fact probably before your first careful reading of this document. The "concepts" in it are all practical -- the theoretical discussions went into Parse::Marpa::ALGORITHM. Even experts in Earley parsing will want to skim Parse::Marpa::CONCEPTS, because as one example, the use of ambiguous lexing has unusual implications for term token.

Parse::Marpa::LANGUAGE document in theory only documents one interface to Marpa, but Marpa is currently the only high-level interface and its document is the most tutorial in approach of all Marpa's documents.

THE EASY WAY

Parse::Marpa::marpa(grammar, text_to_parse, option hash);

The marpa() method takes three arguments: a reference to a string containing a Marpa source description of the grammar in one of the high-level interfaces; a reference to a string with the text to be parsed; and (optionally) a reference to a hash with options.

In scalar context, marpa() returns the value of the first parse if there was one, and undefined if there were no parses. In list context, marpa() returns a list of references to the values of the parses. This is the empty list if there were no parses.

The description referenced by the grammar argument must use one of the high-level Marpa grammar interfaces. Currently the default (and only) high-level grammar interface is the Marpa Demonstration Language.

METHODS FOR FINER CONTROL

new Parse::Marpa(option => value, option => value, ...)

The new method takes a list of arguments which are treated as a hash with options as keys and the option values as the hash values. new() either throws an exception or returns a new grammar object. For the valid options see the options section.

new Parse::Marpa::Parse(option => value, [option => value, ...])

Parse::Marpa::Parse::new() takes as its arguments a series of option, value pairs which are handled as a hash. It returns a new parse object or throws an exception.

One of the options must be the grammar option, and it must have as its value a grammar object with rules in it. Options are documented below.

Parse::Marpa::Parse::text(parse, text_to_parse)

Extends the parse in the parse object using the input text_to_parse, a reference to a string. Returns -1 if the parse is still active after the text_to_parse has been processed. Otherwise the offset of the character where the parse was exhausted is returned. Failures, other than exhausted parses, are thrown as exceptions.

The text is parsed treating each character as an earleme. Terminals are recognized using the lexers that were specified in the source file or with the raw interface.

The character offset where the parse was exhausted is reported in characters from the start of text_to_parse. The first character is at offset zero. Note this means that a zero return from text() indicates that the parse was exhausted at the first character.

A parse is "exhausted" at a point in the input where a successful parse becomes impossible. In most contexts and for most applications, an exhausted parse is a failed parse.

Parse::Marpa::show_location(message, text, offset)

message must be a string, text a reference to a string, and offset, a character offset within that string. show_location() returns a multi-line string with a header line containing message, the line from text containing offset, and a "pointer" line. The pointer line uses the ASCII "caret" symbol to point to the exact offset.

Parse::Marpa::Parse::initial(parse, parse_end)

Performs the recognition phase of a parse. On successful recognition of a parse, initial() returns a value of 1. The user may then get value of the parse with Parse::Marpa::Parse::value(), and may iterate through any other parses with Parse::Marpa::Parse::next().

initial() returns undefined if it fails to recognize a parse. Other failures are thrown as exceptions.

The parse_end argument is optional. If provided, it must be an earleme number where the parse ends. In the standard case, a successful parse in offline mode, the default is to parse to the end of the input.

In case of an exhausted parse, the default is the point at which the parse was exhausted. Most of the time that won't be very helpful, frankly. An exhausted parse means a failed parse unless the user is up to some advanced wizardry. Failed parses are usually addressed by fixing the grammar or the input, but if the user wants to try error recovery, the Parse::Marpa::Parse::find_complete_rule() method may help.

At this point, online mode is also bleeding-edge wizardry. In online mode there is no obvious "end of input". It is not well tested, and Marpa doesn't yet provide a lot of tools for working with it. It's up to the user to determine where to look for parses, perhaps using her specific knowledge of the grammar and the problem space. Again, the Parse::Marpa::Parse::find_complete_rule() method may help.

Parse::Marpa::Parse::next(parse)

Takes a parse object, which must have already been evaluated once with Parse::Marpa::Parse::initial(), and performs the next evaluation. Returns 1 if there was another evaluation, undefined if there are no more values for this initialization of this parse object. Other failures are exceptions.

Parses are returned from rightmost to leftmost, but their order may be manipulated by assigning priorities to the rules and terminals.

Parse::Marpa::Parse::value(parse)

Takes a parse object, which has been set up with Parse::Marpa::Parse::initial() and may have been iterated with Parse::Marpa::Parse::next(), and returns a reference to its current value. Failures are thrown as exceptions.

Defaults, nulling rules, and non-existent optional items all have as their value a Perl undefined. These are considered to be "calculated values". value() will return these as a reference to an undefined.

In some unusual cases, which will probably be the result of advanced wizardry gone wrong, Marpa will not find a "calculated value" and the return value will be undefined instead of a pointer to undefined. This is considered a Marpa "no value". If initial(), next() and value() are being used with their defaults in offline mode, a "no value" return should not happen and indicates a bug in Marpa.

LESS USED METHODS

The methods in this section explicitly run processing phases which Marpa typically performs indirectly. For example, when Parse::Marpa::Parse::new() is asked to create a new parse object from a grammar which has not been through the precomputation phase, that grammar is automatically precomputed, and then deep copied.

The most important uses of these methods are in connection with diagnostics. A user may want to trace Marpa's behavior during, or examine a Marpa object immediately after, a particular processing phase. In such cases, it can be helpful or even necessary to run that phase explicitly.

Parse::Marpa::compile(grammar) or $grammar->compile()

The compile method takes as its single argument a grammar object, and "compiles" it, that is, writes it out using Data::Dumper. It returns a reference to the compiled grammar, or throws an exception.

Parse::Marpa::decompile(compiled_grammar, [trace_file_handle])

The decompile static method takes a reference to a compiled grammar as its first argument. A second, optional, argument is a file handle. It is used both to override the compiled grammar's trace file handle, and for any trace messages produced by decompile() itself. decompile() returns the decompiled grammar object unless it throws an exception.

If the trace file handle argument is omitted, it defaults to STDERR and the new grammar's trace file handle reverts to the default for a new grammar, which is also STDERR. The trace file handle argument is needed because in the course of compilation, the grammar's original trace file handle may have been lost. For example, a compiled grammar can be written to a file and emailed. Marpa cannot expect to find the original trace file handle available and open when the compiled grammar is decompiled by another process on another machine.

Marpa compiles and decompiles a grammar as part of its deep copy processing phase. Internally, the deep copy saves the trace file handle of the original grammar to a temporary, then restores it using the trace file handle argument of decompile().

Parse::Marpa::precompute(grammar) or $grammar->precompute()

Takes as its only argument a grammar object and performs the precomputation phase on it. It returns the grammar object or throws an exception.

OPTIONS

These are the options recognized by the Parse::Marpa::new(), Parse::Marpa::Parse::new(), and Parse::Marpa::set() methods. When the same option is specified in two different method calls, the most recent overrides any previous setting, unless specifically stated otherwise in the description of the option.

Most options set Marpa predefineds, which are also set by the high-level grammar interfaces. Those options which don't deal with Marpa's predefined variables are special to the new() and set() methods. These method-only options are documented in this section.

Options which set predefined are document in the section on predefineds, below.

grammar

Takes as its value a grammar object. Only valid as an option to Parse::Marpa::Parse::new(). There's no default.

source

This takes as its value a reference to a string containing a description of the grammar in the Marpa Demonstartion Language. It must be specified before any rules are added, and may only be specified once in the life of a grammar object.

PREDEFINEDS

This section documents Marpa's predefined variables. These may be set as options to the Parse::Marpa::new(), Parse::Marpa::Parse::new(), and Parse::Marpa::set() methods. Marpa's high-level grammar interfaces may also set them.

The discussion below deals with their semantics and assumes they are being set as method options. Setting of Marpa predefineds through high-level grammar interfaces is described in the documentation for those interfaces.

When an predefined is set as an option in two different method calls, a more recent value replaces any earlier one, except as described below. If the same option is set in the same method call, both via high-level grammar source and an option direct to the method, the setting of the option supplied directly to the method prevails.

default_action

Takes as its value a string, which is expected to be Perl 5 code. By default, rules which don't have an action explicitly specified return a Perl 5 undefined. This default can be changed by setting the default_action predefined.

ambiguous_lex

Treats its value as a boolean. If true, ambiguous lexing is used. This means that even if a terminal in found with a closure or a regex, the search for other terminals at that location continues. If multiple terminals match, all the tokens found are considered in the parse and all may end up being used if the parse is ambiguous. Ambiguous lexing is the default.

If false, Marpa behaves the same way as standard parser generators. Lexing at a location ends with the first terminal matched, and it is up to the user to ensure that the first terminal is the correct one, usually by making lexing deterministic.

online

A boolean. If its value is true, the parser runs in online mode. The default is offline mode. In <offline> mode, Marpa assumes the input has ended when the first parse is requested. It does some final bookkeeping, refuses to accept any more input, and sets its defaults to parse the entire input from beginning to end.

In online mode, which is under construction and poorly tested, new tokens may still be added, and final bookkeeping is never done. Marpa's default idea is still to parse the entire input up to the current earleme, but it's much less clear that's what the user wants. If it's not, it up to her to determine the right places to look for complete parses, based on her knowledge of the structure of the grammar and the input with help from routines supplied for this purpose with Marpa. The Parse::Marpa::Parse::find_complete_rule() method may help.

default_null_value

When a symbol matches the empty string in a parse, by default its value is undefined. This predefined allows you to reset that. Its value must be a string containing Perl 5 code.

default_lex_prefix

The lexers allow every terminal to specify a prefix, a pattern to be matched and discarded before the pattern for the terminal itself. This is typically used to handle leading whitespace.

Where no prefix is specified, the default is for the prefix to always be the empty string. Often, the same prefix will be wanted for most terminals, and it's convenient to change the default lex prefix. This predefined allows that. Its value must be a compiled Perl 5 regex.

version

The version option is optional. If present, it must match the current Marpa version exactly. This is because while Marpa is in alpha, features may change dramatically from version to version and no effort will be invested in making versions compatibile. This strict version regime will be relaxed by the time Marpa leaves beta.

semantics

The semantics option is optional. If present, the value must be a string specifying an available semantics. The only available semantics at this writing is perl5.

volatile

By default, Marpa optimizes parse evaluations by memoization -- saving the value calculated for each node in a parse tree and recalculating only as forced to. In many ordinary circumstances, this is only a modest optimization, but in grammars with semantic actions which are time-consuming, the boost in efficiency could be major.

A parse object is "volatile" if it has a semantic action which, given the same child values, might produce a different result. (Perhaps it randomizes, or produces a value based on examination of data outside the parse.) A parse object is also "volatile" if any of its semantic actions have side effects. Side effects occur, for example, when more than one semantic action modifies the same data object.

Previously calculated node values cannot be reused in a volatile parse object. If a parse object is marked "volatile", Marpa always completely recalculates the value of every node it revisits. Grammar objects also have a "volatile" setting. Grammar objects default to non-volatile. Parse objects default to the volatility setting of the grammar they were created from.

It is always safe to mark a grammar volatile, though it may have an efficiency cost. If, as is usual, none of your semantic actions modify outside values and all of them rely only on constants and child values in calculating their return value, then it is safe to accept the Marpa's default behavior.

The user should be aware that Marpa's default behavior includes setting a grammar to volatile in many cases when the grammar has sequence productions. Marpa often optimizes sequence evaluation by passing pointers to arrays among nodes instead of copying the arrays. This means that the nodes are modifing the same data object, which makes the grammar volatile, and Marpa "does the right thing" about this without user intervention.

Resetting a volatile object back to non-volatile is almost certainly a mistake, and Marpa doesn't allow it.

It's possible that adding further fine-tuning, such as the ability to label particular rules volatile, might be helpful. But if a grammar writer really is after time efficiency, it may be easiest and most effective to label the entire grammar as volatile, and then make extensive use of side effects and memoization to accomplish the optimization.

warnings

This is a boolean which enables warnings about inaccessible and unproductive rules in the grammar. Warnings are written to the trace file handle. By default, warnings are on.

Inaccessible rules are those which can never be produced by the start symbol. Unproductive rules are those which no possible input could ever match. Marpa is capable of simply ignoring these, if the remaining rules are sufficient to specify a useable grammar.

Inaccessible and unproductive rules sometimes indicate errors in the grammar design. But a user may have plans for them, may wish to keep them as notes, or may simply wishes to look at them at another time.

code_lines

If there is a problem with user supplied code, Marpa prints the error message and a description of where the code is being used. Marpa will display the code itself as well. The value of this option tells Marpa how many lines to print before truncating the code. If it's zero, no code is displayed. If it's negative, all the code is displayed, no matter how long it is. The default is 30 lines.

preamble

The preamble is a string which should contain Perl 5 code. The preamble is run in the special namespace for each parse object in which user-supplied code (semantic actions and lexing closures), but it is run before any of user-supplied code.

If multiple preambles are specified as method options, the most recent replaces any previous ones. This is consistent with the behavior of other method options, but different from the way preambles behave in the MDL.

IMPLEMENTATION NOTES

Namespaces

For semantic actions and lexing closure, there is a special namespace for each parse object, which is entirely the user's. Otherwise users should use only documented methods from the Parse::Marpa and Parse::Marpa::Parse namespaces, and the $Parse::Marpa::This::v reference. None of these should be modified.

In future versions of Marpa, the $Parse::Marpa::This::v reference will go away and replaced with a small set of macros.

String references

It's often said by those experienced in Perl that passing string refs instead of strings is a pointless and usually counter-productive optimization. I agree, but Marpa is an exception. Marpa will be expected to process and output entire files, some of which might be very long.

Object Orientation

Use of object orientation in Marpa is superficial. Only grammars and parses are objects, and they are not designed to be inherited.

Returns and exceptions

Most Marpa methods return only on success and throw an exception if there's a failure. Exceptions are thrown using croak(). If you don't want an exception to be fatal, catch it using eval. Failures are returned only where they are "non-exceptional".

The most basic example of a "non-exceptional" failure is an exhausted parse. Exhausted parses are usually parse failures, though the user may also be doing some advanced wizardry. Parse failures are common and may even be expected -- the user may be testing inputs. So Parse::Marpa::Parse::text() returns location information for an exhausted parse, instead of throwing an exception.

An even better example of a non-exceptional failure occurs with Parse::Marpa::Parse::next(). This method is used to iterate through the multiple parses of an ambiguous parse. Eventual failure, that is, inability to find another parse, is typically expected and planned for. This non-exceptional failure is returned.

Where methods return failures, their detailed descriptions give specifics.

AUTHOR

Jeffrey Kegler

DEPENDENCIES

Requires Perl 5.10. Users who want or need the maturity and/or stability of Perl 5.8 or earlier probably are also best off with more mature and stable alternatives to Marpa.

LIMITATIONS

Speed

Speed seems remarkably good for an Earley's implementation. In fact, the current bottlenecks seem not to be in the Marpa parse engine, but in the lexing, and in the design of the Marpa Demonstration Language.

Ambiguous Lexing and Speed

Ambiguous lexing has a cost, and grammars which can turn ambiguous lexing off can expect to parse twice as fast. Right now when Marpa tries to lex multiple regexes at a location, it does so using successive, individual regex matches.

There may be a more efficient way to use Perl 5 regexes to return all of the matches from a set of alternatives. A complication is that precompilation is not possible. Marpa does predictive lexing and the possibilities are not known until shortly before the match is attempted, But I believe that lazy evaluation and memoizing could have big payoffs in the cases of most interest.

The Marpa Demonstration Language and Speed

The Marpa Demonstration Language was designed to show off Marpa's power, and not necessarily to run quickly. This meant that even if a feature's utility came at a high cost in time efficiency, I would still keep the feature if I thought it demonstrated an important capability of Marpa. A high-level grammar interface with less interest in "showing off" could easily run much more quickly.

As a reminder, if the MDL's speed parsing a particular grammar becomes an issue, the grammar can be precompiled. Subsequent runs from the compiled grammar won't incur any overhead from either the MDL or precomputation.

About Speed and Parsers

In considering speed, it's useful for users to be aware of Marpa's position in the hierarchy of grammars. Marpa parses many grammars which bison, yacc, Parse::Yapp, and Parse::RecDescent cannot. For these, it's clearly faster. When it comes to time efficiency, never is not hard to beat.

Marpa allows grammars to be expressed in their most natural form. It's ideal where programmer time is important relative to running time. Right now, special-purpose needs are often addressed with regexes. This works wonderfully if the grammar involved is regular, but across the Internet many man-years are being spent trying to shoe-horn non-regular grammars into Perl 5 regexes.

Marpa is also a good alternative whenever another parser requires backtracking. Earley's parsers never need to backtrack. They find every possible parse the first time through.

Backtracking is a gamble, and often one you find you've made against the odds. Backtracking solutions are non-intuitive. Backtracking solutions are hard to read, even by the people who write them, and even when they've been carefully documented.

Backtracking solutions handle change poorly. All the hard work that went into creating and documenting a good backtracking regex will most likely be totally useless if the target language changes in any serious way.

If in the search for efficiency you are writing or rewriting your grammar to be LALR or regular, it is a good reason not to use Marpa, If a grammar is converted to be LALR or regular, Marpa takes advantage of this and will run much faster. But it will run faster yet on a parser designed for such grammars: bison, yacc and Parse::Yapp for LALR; regexes for regular grammars.

Finally, there are the many situations we need to do some parsing as a one-shot and don't want to have to care what subcategory our grammar falls in. We want to write some quick BNF and get on with it. For this, there's Marpa.

BUGS AND MISFEATURES

Options Code Poorly Organized and Probably Buggy

My strategy for dealing with options to the method calls evolved over time and the code shows it. Most options are only valid at certain points in the parsing, but this is haphazardly enforced and poorly documented. There are probably some just plain ol' bugs. The options code needs to be cleaned up.

MDL Hardwiring

The assumption that the MDL is the only high-level grammar interface in use is hardwired into get_symbol. This will be addressed before going beta.

Timing of Semantics Finalization

All semantics should be finalized during creation of the parse object. This is mostly true now, but the value of null objects is calculated while rules are being added. Marpa needs to be changed so that 100% of the finalization of semantics happens during creation of the parse object.

Priority Conflicts

If non-default priorities are given to rules, it's possible two rules with different priorities could wind up in the same node of the Marpa SDFA. I won't explain the details of SDFA's, but Marpa won't proceed in that circumstance.

I've actually never seen this happen, and one reason the problem is not fixed is that I need to contrive a case where the problem occurs before I make a fix. Otherwise, I can't test it. But if you're the unlucky first person to encounter this, here are the workarounds.

Workaround 1: Marpa will report the rules which caused the conflict. If they can be changed to have the same priority, the problem is solved.

Workaround 2: Instead of using priorities, use multiple parses. That is, instead of using priorities to make the desired parse first in order, allow the "natural" order and iterate through the parse until you get the one you want.

Workaround 3: Make a small change in the grammar. Be aware that the code which creates the SDFA is smart enough so that you'll probably need to make some sort of real change to the target language. Simply writing different rules with the same effect probably won't make the problem go away.

I believe there's a fix to this problem, but it will require not just figuring out a way to make it occur, but some mathematics. The fix is to change the SDFA to be a little more non-deterministic, so that there are different SDFA nodes for the different priorities, with empty transitions between them. (Aren't you sorry you asked?)

Testing that a fix of this kind doesn't break grammars isn't sufficient, and I'll need to show the current and the fixed SDFA's are "equivalent" in the appropriate mathematical sense. That may even require a formal proof.

For now, there's the comfort that the problem seems to be quite rare.

Non-intuitive Parse Order in Unusual Cases

This problem occurs when a production has more than two nullable symbols on the right hand side, so that it is ambiguous, and the semantics are such that order of the parses matters. This doesn't happen in any practical grammars I've tried, perhaps because it's a unnatural way to set up the semantics. But it does happen in textbook grammars.

There is a very straightforward workaround, described below. But the problem needs to be fixed, certainly before Marpa goes beta.

Details: The problem occurs because these productions are rewritten internally by CHAF. A rightmost parse comes first as I have documented, but it is a rightmost parse for the grammar as rewritten by CHAF. This is a bug for pendantic reasons, because CHAF rewritting is supposed to be invisible. It's a bug for practical reasons because the CHAF-driven order is not intuitive, and I can't picture it ever being the desired first choice. Priorities are not a workaround, because priorites cannot be set for rules within a CHAF rewrite.

Workaround: Rewrite the rule for which this is a problem. The problem only occurs where a rule is subject to CHAF rewriting, and CHAF rewrites are only done to rules with more than two nullables on the right hand side. It is always possible to break up a rule into other rules such that at most two nullables occur on the right hand side.

Perl Style comments Not Recognized in Some Places

Perl style comments are not recognized just before literal regexes and q- and qq-quoted literal strings. The problem is that these are treated as whitespace and whitespace in implemented with lex prefixes. I designed things so that Marpa internal lexing routines do their own prefix recognition and right now they don't recognize comments. I now realize that putting the prefixing logic inside the internal lexing routines requires the same logic to be repeated many times and is a misfeature. It needs to be fixed.

Workaround: Move the Perl 5 style comment somewhere else.

Priorities Cannot Be Set in MDL for Terminals

Priorities cannot be set in MDL for terminals. Fix this before going beta.

Workaround: Use the priorities for rules. Extra rules can be added to simulate terminal priorities.

What! You Found Even More Bugs!

Please report any bugs or feature requests to bug-parse-marpa at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Parse-Marpa. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Parse::Marpa

You can also look for information at:

ACKNOWLEDGMENTS

Marpa is the parser described in John Aycock and R. Nigel Horspool's "Practical Earley Parsing", The Computer Journal, Vol. 45, No. 6, 2002, pp. 620-630. I've made significant changes to it, which are documented separately (Parse::Marpa::ALGORITHM). Aycock and Horspool, for their part, built on the algorithm discovered by Jay Earley, and described in his "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970.

I'm grateful to Randal Schwartz for his encouragement over the years that I've been working on Marpa. My one conversation about Marpa with Larry Wall was brief and long ago, but his openness to the idea was a major encouragement, and his insights into how humans do programming, how they do languages, and how those two endeavors interconnect, has been a major influence. More recently, Allison Randal and Patrick Michaud have been generous with their valuable time. They might have preferred that I volunteered as a Parrot cage-cleaner, but if so, they were too polite to say so.

In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp), Damian Conway (Parse::RecDescent) and Graham Barr (Scalar::Util).

COPYRIGHT & LICENSE

Copyright 2007 Jeffrey Kegler, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.