NAME
Marpa::PP::Semantics - How Marpa Evaluates Parses
SYNOPSIS
my $grammar = Marpa::Grammar->new(
{ start => 'Expression',
actions => 'My_Actions',
default_action => 'first_arg',
rules => [
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
{ lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
{ lhs => 'Factor',
rhs => [qw/Factor Multiply Factor/],
action => 'do_multiply'
},
],
}
);
sub My_Actions::do_add {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 + $t2;
}
sub My_Actions::do_multiply {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 * $t2;
}
sub My_Actions::first_arg { shift; return shift; }
my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';
OVERVIEW
Marpa's semantics is of two kinds:
The value semantics. This is the semantics which produces the value of the parse tree.
The ranking semantics, which is used to rank ambiguous parse results, but which can be exploited for its side effects.
Value Semantics
Marpa's value semantics will be familiar to those who have used traditional methods to evaluate parses. A parse is seen as a parse tree. Nodes on the tree are evaluated recursively, bottom-up. Once the values of all its child nodes are known, a parent node is ready to be evaluated. The value of a parse is the value of the top node of the parse tree.
Marpa's semantics operate on the application's original rules and symbols. The internals of the parser are hidden from the semantics. Nodes in Marpa's virtual parse trees are of three kinds:
Token Nodes, which have a constant semantics, initialized when the token is read.
Rule Nodes, which have a dynamic semantics, based on action names and actions, as described below.
Null Nodes, which have a constant semantics, initialized when the recognizer is created.
Action Names and Actions
When a Marpa grammar is created, much of its value semantics and ranking semantics is specified indirectly, as action names. Marpa does not allow semantics to be specified directly as Perl closures until the evaluation phase. This approach to semantics follows Perl 6. I discovered the wisdom of having an additonal layer of abstraction the hard way.
To implement its semantics of action names and actions, Marpa must do three things:
Determine the action name.
Resolve the action name to an action. An action is a Perl closure.
Call the Perl closure to produce the actual result.
The action names and actions used to produce the node value are called value action names and value actions. The action names and actions used to rank nodes are called ranking action names and ranking actions.
An action name and action is also used to create the per-parse variable, as described below. The per-parse variable is a special case, but it is intended to be used as part of the value semantics.
NODES
Token Nodes
For every input token, there is an associated token node. In the default model of input, every token will become a leaf node in the parse tree. Tokens always have a token symbol. At lexing time, they can be assigned a token value. If no token value is assigned at lex time, the token value defaults to a Perl undef
.
Rule Nodes
Nodes which are ancestors of token nodes are called rule nodes. Rule nodes are always associated with a rule. The value of a rule node is computed at Node Evaluation Time. Applications can specify, on a per-rule basis, Perl closures to evaluate rule nodes. The Perl closures which produce the value of rule nodes are called value actions.
A value action's arguments will be a per-parse variable followed by the values of its child nodes in lexical order. The return value of a value action becomes the value of the node. A value action is always evaluated in scalar context. If there is no value action for a rule node, the value of the rule node is a Perl undef
.
Sequence Rule Nodes
Some rules are sequence rules. Sequence rule nodes are also rule nodes. Everything said above about rule nodes applies to sequence rule nodes. Specifically, the arguments to the value actions for sequence rules are the per-parse variable followed by the values of the child nodes in lexical order.
The difference (and it is a big one) is that in an ordinary rule, the right hand side is fixed in length, and that length is known when you are writing the code for the value action. In a sequence rule, the number of right hand side symbols is not known until node evaluation time. The Perl closure that is the value action of a sequence rule node is written in the same way that you write any Perl closure that handles a variable number of arguments.
Sequence semantics work best when every child node in the sequence has the same semantics. When that is not the case, writing the sequence using ordinary non-sequence rules should be considered as an alternative.
By default, if a sequence rule has separators, the separators are thrown away before the value action is called. This means that separators do not appear in the @_
array of the Perl closure which is the value action. If the value of the keep
rule property is a Perl true value, separators are kept, and do appear in the value action's @_
array.
Null Nodes
The Default Null Semantics
A null node is a node which derives the zero-length, or empty string. By default, the value of a null node is a Perl undef
. This is adequate for many applications.
More Complex Null Semantics
This document contains all that most users will need to know about implementing their null semantics. Users who need to implement a very complex null semantics, or who simply would like to read a more detailed account of how Marpa's null semantics works, can turn to the document on null semantics.
An application can assign null values to symbols. If the null node is for a symbol, its null value is the null value of its symbol. A null node can be for a rule. In that case, its null value is the null value of the rule's LHS.
An application can also change a grammar's default null value, which is undef
, to a constant, using the grammar's default_null_value
named argument.
Specifying null semantics on a per-symbol basis is a natural way to do it, and the result is essentially what the user would expect. Several special issues should be watched for.
Nulled Subtrees
Basing the null value on the node's symbol implies that, in a null subtree, the semantics of the topmost null node will be the only thing that matters. In computing its null value, a node never takes into account the values of its child nodes.
Nullable Rules
Nullable rules derive their values from two different sources, depending on whether they are nulled or not. If a nullable rule node is not nulled, its value comes from the semantics of its rule. If a nullable rule node is nulled, its value is the null value of the rule's LHS symbol.
It is up to the application to make sure that, for nullable rules, the rule semantics and the null value of the LHS symbol "make sense" when used together in an application. This can always be done, if necessary by making sure the LHS of the nullable rule is a dedicated symbol -- one that serves that purpose and that purpose only. A LHS symbol can be regarded as dedicated if it is not a terminal and is not the LHS of any other rule.
Nullable Sequence Rules
The remarks just made about nullable rules apply to nullable sequence rules. A sequence rule is nullable if the minimum number of items in it is zero. A nullable sequence which is nulled has the null value of its LHS symbol. A nullable sequence which is not nulled has the semantics of its rule.
SEMANTIC PHASES
Most applications will find that the order in which Marpa executes its semantics "just works". This section describes that order in detail. These details can matter in some applications, for example, those which exploit side effects.
When the semantics are applied to a parse, it produces a value called a parse result. Because Marpa allows ambiguous parsing, each parse can have zero or more parse results. A parse series is the series of zero or more parse results for a single evaluation. The first call to the the recognizer's value
method after the recognizer is created is the start of the first parse series. The first parse series continues until there is a call to the the reset_evaluation
method or until the recognizer is destroyed. Usually, an application is only interested in a single parse series.
When the reset_evaluation
method is called for a recognizer, it begins a new parse series. The new parse series continues until there is another call to the the reset_evaluation
method, or until the recognizer is destroyed.
Recognizer Setup Phase
In the Recognizer Setup Phase, the null values of the symbols are determined. The null values of symbols never change -- they are in effect properties of the grammar.
Exactly one Recognizer Setup Phase occurs for every recognizer. For every Marpa Recognizer object, the Recognizer Setup Phase occurs before any Series Setup Phase.
Series Setup Phase
During the Series Setup Phase all value action names are resolved to Perl closures -- the value actions. The value actions are never called in the Series Setup Phase. Value actions are called in the Tree Traversal Phase.
Exactly one Series Setup Phase occurs for each parse series. A Series Setup Phase occurs in the first call of the recognizer's value
method of each parse series.
Ranking Phase
In the Ranking Phase, all ranking action names are resolved to Perl closures (ranking actions), and those ranking actions are called. The entire ranking semantics is carried out in the Ranking Phase. The Ranking Phase is the only phase in which ranking action names are resolved and the only phase in which ranking actions are called.
Ranking Phases do not occur if recognizer's ranking_method is "none
", which is the default. If the recognizer's ranking method is set to a value other than "none
", a Ranking Phase occurs during the first call of the recognizer's value
method for each parse series.
Result Setup Phase
In the Result Setup Phase, the per-parse variable is created. If a constructor was found for the action_object
, it is run at this point, and the per-parse variable is its return value. Exactly one Result Setup Phase occurs for each parse result.
In every parse series, all the Result Setup Phases occur after the Series Setup Phase. In every parse series which has a Ranking Phase, all the Result Setup Phases occur after the Ranking Phase. For each parse result, the Result Setup Phase occurs before its Tree Traversal Phase.
Tree Traversal Phase
During the Tree Traversal Phase, the semantic Perl closures are called. For each parse result, there is exactly one Tree Traversal Phase, which follows the Result Setup Phase.
During parse tree traversal, Node Evaluation Time occurs for every node in the parse. Tree Traversal Phases occur during calls of the recognizer's value
method.
Node Evaluation Time
Node Evaluation Time is not really a separate phase. Node Evaluation Time is the Tree Traversal Phase, seen from the point of view of the individual nodes of the parse tree. If a node has a semantic Perl closure, it is called at Node Evaluation Time.
Summary of the Phases
While processing a recognizer, we have
A Recognizer Setup Phase, followed by
the processing of zero or more parse series.
While processing a parse series, we have:
A Series Setup Phase, followed by
an optional Ranking Phase, followed by
the processing of zero or more parse results.
While processing a parse result, we have:
A Result Setup Phase, followed by
a Tree Traveral Phase.
Within the Tree Traversal Phase, from the point of view of each rule node, we have Node Evaluation Time.
FINDING THE ACTION FOR A RULE
Marpa finds the action for each rule based on rule and symbol properties and on the grammar named arguments. Specifically, Marpa attempts the following, in order:
Resolving an action based on the
action
property of the rule, if one is defined.Resolving an action based on the
lhs
property of the rule.Resolving an action based on the
default_action
named argument of the grammar, if one is defined.Defaulting to a virtual action which returns a Perl
undef
.
Resolution of action names is described below. If the action
property or the default_action
named argument is defined, but does not resolve successfully, Marpa throws an exception. Marpa prefers to "fast fail" in these cases, because they often indicate a mistake in writing the application.
RESOLVING ACTION NAMES
Action names come from these sources:
The
default_action
named argument of Marpa's grammar.The
action
property of Marpa's rules.The
ranking_action
property of Marpa's rules.The
new
constructor in the package specified by theaction_object
named argument of the Marpa grammar.The
lhs
property of Marpa's rules.
Explicit Resolution
The recognizer's closures
named argument allows the user to directly control the mapping from action names to actions. The value of the closures
named argument is a reference to a hash whose keys are action names and whose hash values are CODE refs.
If an action name is the key of an entry in the closures
hash, it resolves to the closure referenced by the value part of that hash entry. Resolution via the closures
named argument is called explicit resolution.
When explicit resolution is the only kind of resolution that is wanted, it is best to pick a name that is very unlikely to be the name of a Perl closure. Many of Marpa::HTML's action names are intended for explicit resolution only. In Marpa::HTML those action names begin with an exclamation mark ("!"), and that convention is recommended.
Fully Qualified Action Names
If explicit resolution fails, Marpa transforms the action name into a fully qualified Perl name. An action name that contains a double colon ("::
") or a single quote ("'
") is considered to be a fully qualified name. Any other action name is considered to be a bare action name.
If the action name to be resolved is already a fully qualified name, it is not further transformed. It will be resolved in the form it was received, or not at all.
For bare action names, Marpa tries to qualify them by adding a package name. If the actions
grammar named argument is defined, Marpa uses it as the package name. Otherwise, if the action_object
grammar named argument is defined, Marpa uses it as the package name. Once Marpa has fully qualified the action name, Marpa looks for a Perl closure with that name.
Marpa will not attempt to resolve an action name that it cannot fully qualify. This implies that, for an action name to resolve successfully, one of these four things must be the case:
The action name resolves explicitly.
The action name is fully qualified to begin with.
The
actions
named argument is defined.The
action_object
named argument is defined.
In all but one circumstance, failure to resolve an action name is thrown as an exception. Marpa is more lenient when a rule attempts to use the lhs
rule property as an action name. That is the one case in which Marpa will look at other alternatives.
Marpa's philosophy asks the programmer to be specific about action names. This can be an inconvenience, but Marpa prefers this to silently executing unintended code.
Generally it is a good practice to keep the semantic Perl closures into their own namespace. But if, for example, the user wants to leave the semantic closures in the main
namespace, she can specify "main"
as the value of the actions
named argument.
THE PER-PARSE VARIABLE
In the Result Setup Phase, Marpa creates a per-parse variable. This becomes the first argument of the semantic Perl closures for the rule nodes. If the grammar's action_object
named argument is not defined, the per-parse variable is initialized to an empty hash ref.
Most data for the value actions of the rules will be passed up the parse tree. The actions will see the values of the rule node's child nodes as arguments, and will return their own value to be seen as an argument by their parent node. The per-parse variable can be used for data which does not conveniently fit this model.
The lifetime of the per-parse variable extends into the Tree Traversal Phase. During the Tree Traversal Phase, Marpa's internals never alter the per-parse variable -- it is reserved for use by the application.
Action Object Constructor
If the grammar's action_object
named argument has a defined value, that value is treated as the name of a class. The action object constructor is the new
method in the action_object
class.
The action object constructor is called in the Result Setup Phase. The return value of the action object constructor becomes the per-parse variable. It is a fatal error if the grammar's action_object
named argument is defined, but does not name a class with a new
method.
Resolution of the action object constructor is resolution of the action object constructor name. The action object constructor name is formed by concatenating literal string "::new
" to the value of the action_object
named argument.
All standard rules apply when resolving the action object constructor name. In particular, bypass via explicit resolution applies to the action object constructor. If the action object constructor name is a hash key in the evaluator's closures
named argument, then the Perl closure referred to by the value of that hash entry becomes the action object constructor.
If a grammar has both the action
and the action_object
named arguments defined, all action names except for the action object constructor will be resolved in the action
package or not at all. The action object constructor is always in the action_object
class, if it is anywhere.
PARSE ORDER
If a parse is ambiguous, all parses are returned, with no duplication. By default, the order is arbitrary, but it is also possible to control the order. Details are in the document on parse order.
CONTEXT-AWARE STATIC METHODS
sub rank_null_a {
return \( ( $MyTest::MAXIMAL ? -1 : 1 )
* 10**( 3 - Marpa::token_location() ) );
}
Ranking Perl closures have available a set of context-aware static methods. A ranking action can use these methods to learn about the context in which it is called. As of this writing, the context-aware static methods are available for the ranking actions only. This may change in the future.
Marpa::PP::location
Returns the earleme location of the origin (or start) of the rule.
Marpa::PP::token_location
Returns the earleme location of the token. Intended for use in callbacks associated with empty rules and nulling symbols.
INFINITE LOOPS
Marpa allows grammar with infinite loops. These are universally considered useless in practical applications, and Marpa supports all the semantics for these that should be necessary and more. Marpa allows infinite loops because doing so allows it to accurately claim to support every grammar that can be written in BNF.
Marpa applies semantics to infinite loops by breaking the loop at some convenient point, so that an infinite regress is prevented. Where the break occurs varies among Marpa implementations. If an infinite loop is given a null semantics, which is the default, the location of the break will not matter.
More might be done. In particular, a precise definition of where loops are broken would allow applications to depend on the details of the structure of infinite loops. But practical interest in this seems non-existent. For those curious, further details can be found in a separate document.
COPYRIGHT AND LICENSE
Copyright 2011 Jeffrey Kegler
This file is part of Marpa::PP. Marpa::PP is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
Marpa::PP is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser
General Public License along with Marpa::PP. If not, see
http://www.gnu.org/licenses/.