NAME
Marpa::XS - XS extension of Marpa
SYNOPSIS
use Marpa::XS;
my $grammar = Marpa::XS::Grammar->new(
{ start => 'Expression',
actions => 'My_Actions',
default_action => 'first_arg',
rules => [
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
{ lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
{ lhs => 'Factor',
rhs => [qw/Factor Multiply Factor/],
action => 'do_multiply'
},
],
}
);
$grammar->precompute();
my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
$recce->read( 'Number', 42 );
$recce->read( 'Multiply', );
$recce->read( 'Number', 1 );
$recce->read( 'Add', );
$recce->read( 'Number', 7 );
sub My_Actions::do_add {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 + $t2;
}
sub My_Actions::do_multiply {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 * $t2;
}
sub My_Actions::first_arg { shift; return shift; }
my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';
DESCRIPTION
This is ALPHA software
This is alpha software. There may be bugs. The interface may change. Please be careful. Do not rely on it for anything mission-critical.
Overview
Marpa::XS parses any language whose grammar can be written in BNF. That includes recursive grammars, ambiguous grammars, infinitely ambiguous grammars and grammars with useless or empty productions.
This document contains a top-level overview of the API for the Marpa parse engine. The two examples in this document show the typical flows of Marpa method calls. This document will use these examples to describe the basic features of Marpa in semi-tutorial fashion. Marpa's advanced features, and full reference details of all features, can be found in the other Marpa API documents.
The Three Phases
A parser needs to:
Accept a grammar.
Read input.
Return values from the parses, according to a semantics.
In Marpa these three tasks are, for the most part, distinct phases. Grammars are Marpa::XS::Grammar objects. The reading of input and the evaluation of the parse according to the semantics is performed by Marpa::XS::Recognizer objects.
EXAMPLE 1: A SIMPLE CALCULATOR
The synopsis shows the code for a very simple calculator. It handles only addition and multiplication of integers. This section explains, line by line, how it works.
Marpa::XS::Grammar::new
my $grammar = Marpa::XS::Grammar->new(
{ start => 'Expression',
actions => 'My_Actions',
default_action => 'first_arg',
rules => [
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
{ lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
{ lhs => 'Factor',
rhs => [qw/Factor Multiply Factor/],
action => 'do_multiply'
},
],
}
);
Marpa grammars are Marpa::XS::Grammar objects. They are created with the Marpa::XS::Grammar::new constructor. The arguments to Marpa::XS::Grammar::new are references to hashes of named arguments. In the key/value pairs of these hashes, the hash key is the name of the argument, and the hash value is the value of the named argument.
The start Named Argument
start => 'Expression',
The start
named argument is required. Its value is a string containing the name of the grammar's start symbol.
Named Arguments for the Semantics
actions => 'My_Actions',
default_action => 'first_arg',
The actions
and default_action
named arguments specify semantics. Their argument values are strings, which acquire their semantics during evaluation.
Evaluation will be described later. Peeking ahead, the default_action
named argument will be interpreted as an action name. This action name will resolve to an action -- a Perl closure that implements semantics. The action specified by default_action
is used as the action for rules with no action of their own. actions
provides the name of a Perl package where Marpa will look for its actions.
The rules Named Argument
rules => [
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
{ lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
{ lhs => 'Factor',
rhs => [qw/Factor Multiply Factor/],
action => 'do_multiply'
},
],
The value of the rules
named argument is a reference to an array of rule descriptors. In this example, all the rule descriptors are in the "long" form -- they are references to hashes of rule properties. In each key/value pair of a rule descriptor hash, the key is the name of a rule property, and the hash value is the value of that rule property.
The lhs Property
The value of the lhs
rule property must be a string containing the name of the rule's left hand side symbol. Every Marpa rule must have a left hand side symbol.
The rhs Property
The value of the rhs
property is a reference to an array of strings containing names of the rule's right hand symbols, in order. This array may be zero length, in which case this is an empty rule -- a rule with no symbols on the right hand side. There are no empty rules in this example.
The action Property
The value of the action
rule property is a string. Peeking ahead, each action
property string will be interpreted as an action name. This action name will be resolved to a Perl closure that implements the rule's semantics.
Marpa::XS::Grammar::precompute
$grammar->precompute();
Before a Marpa grammar object can be used by a Marpa recognizer, it must be precomputed. Precomputation compiles data structures that the recognizer will need.
Marpa::XS::Recognizer::new
my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
Marpa::XS::Recognizer::new
creates a new recognizer. Its arguments are references to hashes of named arguments. In this example the only named argument is the required argument: "grammar
". The value of the grammar
named argument must be a precomputed Marpa grammar.
Marpa::XS::Recognizer::read
$recce->read( 'Number', 42 );
$recce->read( 'Multiply', );
$recce->read( 'Number', 1 );
$recce->read( 'Add', );
$recce->read( 'Number', 7 );
The Marpa::XS::Recognizer::read
method takes two arguments, a token name and a token value. The token name must be the name of a valid terminal symbol in the grammar. By default all symbols are valid as terminal symbols, unless a grammar contains empty rules.
The grammars in the examples in this document do not contain empty rules, and therefore are free to use any symbol in the grammar as a token name. For more on terminals, including how to explicitly mark terminal symbols, see "Terminals" in Marpa::XS::Grammar.
The token value must be a Perl scalar, but otherwise its form and semantics are entirely up to the application. If the token value is omitted, it is a Perl undef
. In the calculator example, the values of the "Add
" and "Multiply
" tokens are never used, so they are allowed to default to undef
.
Marpa::XS::Recognizer::value
my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';
The Marpa::XS::Recognizer::value
method returns a reference to the parse result's value, if there was a parse result. If there was no parse result, Marpa::XS::Recognizer::value
returns undef
.
Resolving the Semantics
The first thing Marpa::XS::Recognizer::value
needs to do is to resolve the semantics. Resolving the semantics means mapping the action names into actions. Actions are Perl closures which directly implement semantics. In this example, the actions
named argument is specified. actions
is a Perl package name. Marpa will look for actions in that package.
actions => 'My_Actions',
{ lhs => 'Factor', rhs => [qw/Factor Multiply Factor/], action => 'do_multiply' },
For example, the action
property for the above rule is "do_multiply
" and the actions
named argument to the grammar was "My_Actions
". So Marpa looks for a closure whose fully qualified name is My_Actions::do_multiply
, which it finds:
sub My_Actions::do_multiply {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 * $t2;
}
Rules do not always have action
properties. That is the case with these rules in this example:
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
Where there is no action
rule property, Marpa tries to use the lhs
property as an action name. When Marpa cannot resolve the lhs
property as an action name, it will fall back to using the default action for the grammar.
For example, in the first rule in the above display, Marpa will look for a Perl closure with the fully qualified name "My_Actions::Expression
". When Marpa does not find a closure by that name, Marpa will fall back to trying to use the default action.
The other two rules in the above display work similarly. Marpa will look for Perl closures named "My_Actions::Term
" and "My_Actions::Factor
". It will not find them and will fall back to trying to use the default action, as described next.
default_action => 'first_arg',
The default_action
named argument is resolved in the same way as are the action
properties of the rules. In this example, default_action is specified as "first_arg
" and resolves to My_Actions::first_arg
.
Actions
sub My_Actions::first_arg { shift; return shift; }
sub My_Actions::do_add {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 + $t2;
}
Value actions are Perl closures used as callbacks. Value actions are called when nodes in a parse tree are evaluated. A value action receives one or more arguments. The first argument to a value action is always a per-parse-result object, which the callbacks can use as a scratchpad. In these examples, the per-parse-result object is not used.
For a non-empty rule, the second and any subsequent arguments to the callback are the values, in lexical order, of the symbols on the right hand side of the rule. If the action is for an empty rule, the per-parse-result object will be its only argument.
Every value action is expected to return a value. With one exception, this value is passed up to a parent node as an argument. The exception is the value for the start rule. The return value for the start rule becomes the value of the parse result.
Rules with no action specified for them take their semantics from the default_action
named argument. If there is no default action for a grammar, rules without no action specified for them return a Perl undef
.
EXAMPLE 2: AN AMBIGUOUS PARSE
This is the same calculator as before, rewritten to be ambiguous. Rather than give multiplication precedence over addition, the rewritten calculator allows any order of operations. In this example, the actions (My_Actions::do_add
, etc.) and the @tokens
array remain the same as before.
Eliminating precedence makes the grammar shorter, but it also means there can be multiple parse results, and that the different parse results can have different values. In this application we decide, for each input, to return the value for every one of the parse results.
use Marpa::XS;
my $ambiguous_grammar = Marpa::XS::Grammar->new(
{ start => 'E',
actions => 'My_Actions',
rules => [
[ 'E', [qw/E Add E/], 'do_add' ],
[ 'E', [qw/E Multiply E/], 'do_multiply' ],
[ 'E', [qw/Number/], ],
],
default_action => 'first_arg',
}
);
$ambiguous_grammar->precompute();
my $ambiguous_recce =
Marpa::XS::Recognizer->new( { grammar => $ambiguous_grammar } );
$ambiguous_recce->read( 'Number', 42 );
$ambiguous_recce->read( 'Multiply', );
$ambiguous_recce->read( 'Number', 1 );
$ambiguous_recce->read( 'Add', );
$ambiguous_recce->read( 'Number', 7 );
my @values = ();
while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value() ) ) {
push @values, ${$ambiguous_value_ref};
}
Short Form Rule Descriptors
rules => [
[ 'E', [qw/E Add E/], 'do_add' ],
[ 'E', [qw/E Multiply E/], 'do_multiply' ],
[ 'E', [qw/Number/], ],
],
The rule descriptors in the ambiguous example demonstrate the "short" or array form of rule descriptors. Array form rule descriptors are references to arrays. Here the elements are, in order, the lhs
property, the rhs
property, and the action
property.
Marpa::XS::Recognizer::value
my @values = ();
while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value() ) ) {
push @values, ${$ambiguous_value_ref};
}
When called more than once, the Marpa::XS::Recognizer::value
method iterates through the parse results. For each call, it returns a reference to the value of a parse result. At the end of the iteration, after all parse results have been returned, Marpa::XS::Recognizer::value
returns undef
. If there were no parse results, Marpa::XS::Recognizer::value
returns undef
the first time that it is called.
ERRORS AND EXCEPTIONS
Methods in the Marpa API do not return errors. When there are errors, Marpa API methods throw an exception.
INHERITANCE
Classes in the Marpa API are not designed to be inherited.
OTHER DOCUMENTS
Basic Documents
This document gives a semi-tutorial overview of the entire Marpa API. For full details on Marpa's grammar objects and their methods, see the Marpa::XS::Grammar document. For full details on Marpa's recognizer objects and their methods, see the Marpa::XS::Recognizer document.
Marpa::XS::Parse_Terms is intended as a quick refresher in parsing terminology. Marpa's standard semantics are fully described in the Marpa::XS::Semantics document. Techniques for tracing and for debugging your Marpa grammars are described in the Marpa::XS::Tracing document and the Marpa::XS::Debug document.
Academic Documents
For those with a theoretical bent, my sources, and other useful references, are described in Marpa::XS::Advanced::Bibliography. Marpa::XS::Advanced::Algorithm describes the Marpa algorithm itself.
AUTHOR
Jeffrey Kegler
Why is it Called "Marpa"?
Marpa is the name of the greatest of the Tibetan "translators". In his time (the 11th century AD) Indian Buddhism was at its height. Marpa's generation of scholars devoted itself to producing Tibetan versions of Buddhism's Sanskrit scriptures. Marpa became the greatest of them, and today is known as Marpa Lotsawa: "Marpa the Translator".
Blatant Plug
Marpa is a character in my novel, The God Proof. The God Proof centers around Kurt Gödel's proof of God's existence. Yes, that Kurt Gödel, and yes, he really did work out a God Proof (it's in his Collected Works, Vol. 3, pp. 403-404). The God Proof is available as a free download (http://www.lulu.com/content/933192). It can be purchased in print form at Amazon.com: http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355.
ACKNOWLEDGMENTS
Marpa is directly derived from two other parsers. The first was discovered by John Aycock and R. Nigel Horspool and is described in their Aycock and Horspool 2002. The second was described by Joop Leo and is described in Leo 1991. Aycock, Horspool, and Leo, in turn, based their algorithms on the algorithm discovered by Jay Earley. I combined the Aycock-Horspool algorithm with the Leo algorithm, and added significant changes of my own. More details on the algorithm itself are in another document.
I'm grateful to Randal Schwartz for his support over the years that I've been working on Marpa. My chats with Larry Wall have been few and brief, but his openness to new ideas has been a major encouragement and his insight into the relationship between "natural language" and computer language has been a major influence. More recently, Allison Randal and Patrick Michaud have been generous with their very valuable time. They might have preferred that I volunteered as a Parrot cage-cleaner, but if so, they were too polite to say.
Many at perlmonks.org answered questions for me. I used answers from chromatic, Corion, dragonchild, jdporter, samtregar and Juerd, among others, in writing this module. I'm just as grateful to those whose answers I didn't use. My inquiries were made while I was thinking out the code and it wasn't always 100% clear what I was after. If the butt is moved after the round, it shouldn't count against the archer.
In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp
), Damian Conway (Parse::RecDescent
) and Graham Barr (Scalar::Util
). Adam Kennedy patiently instructed me in module writing, both on the finer points and on issues about which I really should have know better.
SUPPORT
Marpa::XS comes without warranty. Support is provided on a volunteer basis through the standard mechanisms for CPAN modules. The Support document has details.
COPYRIGHT AND LICENSE
Copyright 2011 Jeffrey Kegler
This file is part of Marpa::XS. Marpa::XS is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
Marpa::XS is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser
General Public License along with Marpa::XS. If not, see
http://www.gnu.org/licenses/.