NAME

Marpa::R2::Semantics - How Marpa evaluates parses

Synopsis

my $grammar = Marpa::R2::Grammar->new(
    {   start          => 'Expression',
        actions        => 'My_Actions',
        default_action => 'first_arg',
        rules          => [
            { lhs => 'Expression', rhs => [qw/Term/] },
            { lhs => 'Term',       rhs => [qw/Factor/] },
            { lhs => 'Factor',     rhs => [qw/Number/] },
            { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
            {   lhs    => 'Factor',
                rhs    => [qw/Factor Multiply Factor/],
                action => 'do_multiply'
            },
        ],
    }
);
sub My_Actions::do_add {
    my ( undef, $t1, undef, $t2 ) = @_;
    return $t1 + $t2;
}

sub My_Actions::do_multiply {
    my ( undef, $t1, undef, $t2 ) = @_;
    return $t1 * $t2;
}

sub My_Actions::first_arg { shift; return shift; }

my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';

Overview

Marpa's semantics will be familiar to those who have used traditional methods to evaluate parses. A parse is seen as a parse tree. Nodes on the tree are evaluated recursively, bottom-up. Once the values of all its child nodes are known, a parent node is ready to be evaluated. The value of a parse is the value of the top node of the parse tree.

When a Marpa grammar is created, its semantics is specified indirectly, as action names. To produce the values used in the semantics, Marpa must do three things:

  • Determine the action name.

  • Resolve the action name to an action.

  • If the action is a rule evaluation closure, call it to produce the actual result.

An action name is either reserved, or resolves to a Perl object. When it resolves to a Perl object, that object is usually a Perl closure -- code. If the Perl object action is code, it is a rule evaluation closure, and will be called to produce the result. When a Perl object action is some other kind of object, its treatment is as described below.

An action name and action is also used to create the per-parse-tree variable, as described below.

Nodes

Token nodes

For every input token, there is an associated token node. Token nodes are leaf nodes in the parse tree. Tokens always have a token symbol. At lexing time, they can be assigned a token value. If no token value is assigned at lex time, the value of the token is a Perl undef.

Rule nodes

If a node is not a token node, then it is a rule node. Rule nodes are always associated with a rule. It a rule's action is a rule evaluation closure, it is called at Node Evaluation Time.

The rule evaluation closures's arguments will be a per-parse-tree variable followed, if the rule is not nulled, by the values of its child nodes in lexical order. If the rule is nulled, the child node values will be omitted. A rule evaluation closure action is always called in scalar context.

If the action is a constant, it becomes the value of the rule node. If the action is a rule evaluation closure, its return value becomes the value of the node. If there is no action for a rule node, the value of the rule node is a Perl undef.

Sequence rule nodes

Some rules are sequence rules. Sequence rule nodes are also rule nodes. Everything said above about rule nodes applies to sequence rule nodes. Specifically, the arguments to the value actions for sequence rules are the per-parse-tree variable followed by the values of the child nodes in lexical order.

The difference (and it is a big one) is that in an ordinary rule, the right hand side is fixed in length, and that length is known when you are writing the code for the value action. In a sequence rule, the number of right hand side symbols is not known until node evaluation time. The rule evaluation closure of a sequence rule must be capable of dealing with a variable number of arguments.

Sequence semantics work best when every child node in the sequence has the same semantics. When that is not the case, writing the sequence using ordinary non-sequence rules should be considered as an alternative.

By default, if a sequence rule has separators, the separators are thrown away before the value action is called. This means that separators do not appear in the @_ array of the rule evaluation closure which is the value action. If the value of the keep rule property is a Perl true value, separators are kept, and do appear in the value action's @_ array.

Null nodes

A null node is a special case of a rule node, one where the rule derives the zero-length, or empty string. When the rule node is a null node, the rule evaluation closure will be called with no child value arguments.

When a node is nulled, it must be as a result of a nullable rule, and the action name and action are those associated with that rule. An ambiguity can arise if there is more than one nullable rule with the same LHS, but a different action name. In that case the action name for the null nodes is that of the empty rule.

The remaining case is that of a set of nullable rules with the same LHS, where two or more of the rules have different action names, but none of the rules in the set is an empty rule. When this happens, Marpa throws an exception. To fix the issue, the user can add an empty rule. For more details, see the document on null semantics.

Action context

sub do_S {
    my ($action_object) = @_;
    my $rule_id         = $Marpa::R2::Context::rule;
    my $grammar         = $Marpa::R2::Context::grammar;
    my ( $lhs, @rhs ) = $grammar->rule($rule_id);
    $action_object->{text} =
          "rule $rule_id: $lhs ::= "
        . ( join q{ }, @rhs ) . "\n"
        . "locations: "
        . ( join q{-}, Marpa::R2::Context::location() ) . "\n";
    return $action_object;
} ## end sub do_S

In addition to the per-parse-tree variable and their child arguments, rule evaluation closures also have access to context variables.

  • $Marpa::R2::Context::grammar is set to the grammar being parsed.

  • $Marpa::R2::Context::rule is the ID of the current rule. Given the rule ID, an application can find its LHS and RHS symbols using the grammar's rule() method.

  • Marpa::R2::Context::location() returns the start and end locations of the current rule.

Bailing out of parse evaluation

my $bail_message = "This is a bail out message!";

sub do_bail_with_message_if_A {
    my ($action_object, $terminal) = @_;
    Marpa::R2::Context::bail($bail_message) if $terminal eq 'A';
}

sub do_bail_with_object_if_A {
    my ($action_object, $terminal) = @_;
    Marpa::R2::Context::bail([$bail_message]) if $terminal eq 'A';
}

The Marpa::R2::Context::bail() static method is used to "bail out" of the evaluation of a parse tree. It will cause an exception to be thrown. If its first and only argument is a reference, that reference is the exception object. Otherwise, an exception message is created by converting the method's arguments to strings, concatenating them, and prepending them with a message indicating the file and line number at which the Marpa::R2::Context::bail() method was called.

Parse trees, parse results and parse series

When the semantics are applied to a parse tree, it produces a value called a parse result. Because Marpa allows ambiguous parsing, each parse can produce a parse series -- a series of zero or more parse trees, each with its own parse result. The first call to the the recognizer's value method after the recognizer is created is the start of the first parse series. The first parse series continues until there is a call to the the reset_evaluation method or until the recognizer is destroyed. Usually, an application is only interested in a single parse series.

When the reset_evaluation method is called for a recognizer, it begins a new parse series. The new parse series continues until there is another call to the the reset_evaluation method, or until the recognizer is destroyed.

Most applications will find that the order in which Marpa executes its semantics "just works". A separate document describes that order in detail. The details can matter in some applications, for example, those which exploit side effects.

Finding the action for a rule

Marpa finds the action for each rule based on rule and symbol properties and on the grammar named arguments. Specifically, Marpa attempts the following, in order:

  • Resolving an action based on the action property of the rule, if one is defined.

  • If the rule is empty, and the default_empty_action named argument of the grammar is defined, resolving an action based on that named argument.

  • Resolving an action based on the default_action named argument of the grammar, if one is defined.

  • Defaulting to a Perl undef value.

Resolution of action names is described below. If the action property, the default_action named argument, or the default_empty_action named argument is defined, but does not resolve successfully, Marpa throws an exception. Marpa prefers to "fast fail" in these cases, because they usually indicate a mistake that the application's author will want to correct.

Resolving action names

Action names come from these sources:

  • The default_action named argument of Marpa's grammar.

  • The default_empty_action named argument of Marpa's grammar.

  • The action property of Marpa's rules.

  • The new constructor in the package specified by the action_object named argument of the Marpa grammar.

Reserved action names

Action names that begin with a double colon ("::") are reserved. At present only the ::undef reserved action is documented for use outside of the DSL-based interfaces.

::undef

A constant whose value is a Perl undef. Perl is unable to distinguish reliably between a non-existent value and scalars with an undef value. This makes it impossible to reliably distinguish resolutions to a Perl undef from resolution problems. The "::undef" reserved action name should be preferred for indicating a constant whose value is a Perl undef.

Explicit resolution

The recognizer's closures named argument allows the user to directly control the mapping from action names to actions. The value of the closures named argument is a reference to a hash whose keys are action names and whose hash values are references. Typically (but not always) these will be CODE refs.

If an action name is the key of an entry in the closures hash, it resolves to the closure referenced by the value part of that hash entry. Resolution via the closures named argument is called explicit resolution.

When explicit resolution is the only kind of resolution that is wanted, it is best to pick a name that is very unlikely to be the name of a Perl object. Many of Marpa::HTML's action names are intended for explicit resolution only. In Marpa::HTML those action names begin with an exclamation mark ("!"), and that convention is recommended.

Fully qualified action names

If explicit resolution fails, Marpa transforms the action name into a fully qualified Perl name. An action name that contains a double colon ("::") or a single quote ("'") is considered to be a fully qualified name. Any other action name is considered to be a bare action name.

If the action name to be resolved is already a fully qualified name, it is not further transformed. It will be resolved in the form it was received, or not at all.

For bare action names, Marpa tries to qualify them by adding a package name. If the actions grammar named argument is defined, Marpa uses it as the package name. Otherwise, if the action_object grammar named argument is defined, Marpa uses it as the package name. Once Marpa has fully qualified the action name, Marpa looks for a Perl object with that name.

Marpa will not attempt to resolve an action name that it cannot fully qualify. This implies that, for an action name to resolve successfully, one of these five things must be the case:

  • The action name is one of the reserved action names.

  • The action name resolves explicitly.

  • The action name is fully qualified to begin with.

  • The actions named argument is defined.

  • The action_object named argument is defined.

Marpa's philosophy is to require that the programmer be specific about action names. This can be an inconvenience, but Marpa prefers this to silently executing unintended code.

If the user wants to leave the rule evaluation closures in the main namespace, she can specify "main" as the value of the actions named argument. But it can be good practice to keep the rule evaluation closures in their own namespace, particularly if the application is not small.

Perl object actions

Actions divide into two types: reserved actions and Perl object actions. A Perl object action is a actual Perl object of some type: either code or a reference.

A Perl object action may come either from resolution of an action name, or from explicit resolution. If it came from explicit resolution, it will always be a reference.

In simplified terms, if a Perl object action comes from resolution of an action name, and there is a Perl closure of that name, the Perl closure is the resolution. If there is a reference with that name it is resolved as described for Perl reference actions. If there is some other scalar with that name, it is a fatal error. All other Perl objects with that name are ignored.

More precisely, the resolution depends the lookup of the action name in the Perl symbol table:

  • If there is a CODE entry, that entry is the resolution.

  • If there is a SCALAR entry, and it is a reference, it is resolved as described for Perl reference actions.

  • If there is a SCALAR entry, and it is not a reference, it is a fatal error.

  • If there is no SCALAR entry and no CODE entry, it will be considered a failure to resolve.

Reference actions

A Marpa action can be a Perl object which is a reference. A reference action may have come via name resolution, or it might be the result of explicit resolution.

Referents should be treated as read-only constants. It is possible to modify the values pointed to by a reference, but the effect will be global, and this is deprecated as bad practice. The referents of reference actions should be treated as read-only, and never modified.

In the following, let $action_ref be the action reference. Reference actions are handled based on type, ignoring any blessing as class objects. This would be the reftype as returned by Scalar::Util::reftype $action_ref.

If the reftype is CODE, the resolution is to the Perl closure. The effect is the same as if an action name had resolved directly to a closure with the same name. The Perl closure, also called a rule evaluation closure, will be called to produce the value of the action.

If the reftype is SCALAR, the resolution is to its de-referenced value, ${$action_ref}. The effect is roughly the same as if an action name had resolved to a closure which returned ${$action_ref}.

If the reftype is ARRAY, the resolution is to the original reference, $action_ref. The effect is roughly the same as if an action name had resolved to a closure which returned $action_ref.

If the reftype is HASH, the resolution is to the original reference, $action_ref. The effect is roughly the same as if an action name had resolved to a closure which returned $action_ref.

Modifying Perl object actions

It is important to note multiple invocations of $action_ref will always point to the same contents, and that any modification will be seen by other invocations. In other words, the modification will have global effect. For this reason modifying the referents of reference actions is deprecated.

For example, assume that actions are in a package named My_Nodes, which contains a hash reference named empty_hash,

package My_Nodes;
our $empty_hash = {};

It can be tempting, in building objects which are hashes, to start with a left node whose action is empty_hash and to add contents to it as the object is passed up the evaluation tree. But $empty_hash points to a single hash object. This single hash object will shared by all nodes, with all nodes seeing each other's changes. Worse, all Marpa parsers which use the same My_Nodes namespace will share the same hash object.

When not simply a mistake, this is usually bad programming practice. Before modifying the referent of a reference action, rule evaluation closures should copy it. Safer, if less efficient, is to define the empty_hash action as a closure which returns {}.

Visibility of Perl object actions

It is important to note that, when Perl closures are used for the semantics, they must be visible in the scope where the semantics are resolved. The action names are usually specified with the grammar, but action resolution takes place in the recognizer's value and reset_evaluation methods. This can sometimes be a source of confusion, because if a Perl closure is visible when the action is specified, but goes out of scope before the action name is resolved, resolution will fail.

The per-parse-tree variable

In the Tree Setup Phase, Marpa creates a per-parse-tree variable. This becomes the first argument of the rule evaluation closures for the rule nodes. If the grammar's action_object named argument is not defined, the per-parse-tree variable is initialized to an empty hash ref.

Most data for the value actions of the rules will be passed up the parse tree. The actions will see the values of the rule node's child nodes as arguments, and will return their own value to be seen as an argument by their parent node. The per-parse-tree variable can be used for data which does not conveniently fit this model.

The lifetime of the per-parse-tree variable extends into the Tree Traversal Phase. During the Tree Traversal Phase, Marpa's internals never alter the per-parse-tree variable -- it is reserved for use by the application.

Action object constructor

If the grammar's action_object named argument has a defined value, that value is treated as the name of a class. The action object constructor is the new method in the action_object class.

The action object constructor is called in the Tree Setup Phase. The return value of the action object constructor becomes the per-parse-tree variable. It is a fatal error if the grammar's action_object named argument is defined, but does not name a class with a new method.

Resolution of the action object constructor is resolution of the action object constructor name. The action object constructor name is formed by concatenating the literal string "::new" to the value of the action_object named argument.

All standard rules apply when resolving the action object constructor name. In particular, bypass via explicit resolution applies to the action object constructor. If the action object constructor name is a hash key in the evaluator's closures named argument, then the value referred to by that hash entry becomes the action object constructor.

If a grammar has both the actions and the action_object named arguments defined, all action names except for the action object constructor will be resolved in the actions package or not at all. The action object constructor is always in the action_object class, if it is anywhere.

Parse order

If a parse is ambiguous, all parses are returned, with no duplication. By default, the order is arbitrary, but it is also possible to control the order. Details are in the document on parse order.

Infinite loops

Grammars with infinite loops (cycles) are generally regarded as useless in practical applications, but Marpa allows them. Marpa can accurately claim to support every grammar that can be written in BNF.

If a grammar with cycles is ambiguous, it can produce cycle-free parses and parses with finite-length cycles, as well as parses with infinite length cycles. Marpa will parse with grammars that contain cycles, and Marpa's evaluator will iterate through the values from the grammar's cycle-free parses. For those who want to know more, the details are in a separate document.

Copyright and License

Copyright 2013 Jeffrey Kegler
This file is part of Marpa::R2.  Marpa::R2 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.

Marpa::R2 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser
General Public License along with Marpa::R2.  If not, see
http://www.gnu.org/licenses/.