NAME

Marpa::XS::Semantics - How Marpa Evaluates Parses

SYNOPSIS

my $grammar = Marpa::XS::Grammar->new(
    {   start          => 'Expression',
        actions        => 'My_Actions',
        default_action => 'first_arg',
        rules          => [
            { lhs => 'Expression', rhs => [qw/Term/] },
            { lhs => 'Term',       rhs => [qw/Factor/] },
            { lhs => 'Factor',     rhs => [qw/Number/] },
            { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
            {   lhs    => 'Factor',
                rhs    => [qw/Factor Multiply Factor/],
                action => 'do_multiply'
            },
        ],
    }
);
sub My_Actions::do_add {
    my ( undef, $t1, undef, $t2 ) = @_;
    return $t1 + $t2;
}

sub My_Actions::do_multiply {
    my ( undef, $t1, undef, $t2 ) = @_;
    return $t1 * $t2;
}

sub My_Actions::first_arg { shift; return shift; }

my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';

OVERVIEW

Marpa's semantics will be familiar to those who have used traditional methods to evaluate parses. A parse is seen as a parse tree. Nodes on the tree are evaluated recursively, bottom-up. Once the values of all its child nodes are known, a parent node is ready to be evaluated. The value of a parse is the value of the top node of the parse tree.

Marpa's semantics see a virtual parse tree with virtual nodes. These virtual structures keep ambiguity and rule rewrites invisible to the semantics. Nodes in Marpa's virtual parse trees are of three kinds:

  • Token Nodes

  • Rule Nodes

  • Null Nodes

Action Names and Semantic Closures

During grammar generation and input recognition, Marpa does not allow semantics to be specified directly as Perl closures. This approach to semantics follows Perl 6, the wisdom of whose example I discovered the hard way.

When a Marpa grammar is created, all non-constant semantics take the form of strings which express actions. Actions are semantics in an indirect form. Actions are given meaning later by the evaluator.

At evaluation time, actions are resolved to semantic Perl closures, and the semantic Perl closures are run to produce values. (For clarity, I often call the Perl closures which implement semantics, semantic closures or semantic Perl closures.)

NODES

Token Nodes

For every input token, there is an associated token node. In the usual, token-stream, model of input, every token will become a leaf node in the parse tree. Tokens always have a token symbol. At lexing time, they can be assigned a token value. If no token value is assigned at lex time, the token value defaults to a Perl undef.

Rule Nodes

Nodes which are ancestors of token nodes are called rule nodes. Rule nodes are always associated with a rule. The value of a rule node is computed at node evaluation time. Applications can specify, on a per-rule basis, semantic Perl closures to evaluate rule nodes.

The semantic closure's arguments will be a per-parse variable followed by the values of its child nodes in lexical order. The value returned by the semantic Perl closure becomes the value of the node. If there is no semantic closure for a rule node, the value of the rule node is a Perl undef.

Sequence Rule Nodes

Some rules are sequence rules. Everything said above about rule nodes, also applies to sequence rule nodes. Specifically, the arguments to the semantic Perl closures for sequence rules are the per-parse variable followed by the values of the child nodes in lexical order.

The difference (and it is a big one) is that in an ordinary rule, the right hand side is fixed in length, and that length is known when you are writing the semantic Perl closure. In a sequence rule, the number of right hand side symbols is not known until node evaluation time. A semantic Perl closure for a sequence rule node is written in the same way that you write any Perl closure that handles a variable number of arguments.

Sequence semantics work best for sequences of items all of which have the same semantics. When that is not the case, writing the sequence using ordinary non-sequence rules should be considered as an alternative.

By default, if a sequence rule has separators, the separators are thrown away before the semantics are applied. They do not appear in the semantic Perl closure's @_ array. If the value of the keep rule property is a Perl true, separators are kept, and do appear in the @_ array.

Null Nodes

A null node is a node which derives the zero-length, or empty string. This means that a null node cannot be the ancestor of any token nodes. In Marpa, null nodes are always leaf nodes.

Null nodes are of two kinds. A nulling symbol node corresponds to a nulling symbol. A nulled rule node represents a nulled rule.

For every null node there is a null node symbol, which is used to the determine the value of the null node. For a nulled rule node, the null node symbol is the nulled rule's left hand side symbol. For a nulling symbol node, the null node symbol is the nulling symbol.

The value of a null node is the null value of the null node symbol. The null value of a symbol comes from that symbol's null_value property, if one is defined. Otherwise, the null value of the symbol comes from the grammar's default null value, as defined by the grammar's default_null_value named argument. If neither the symbol null_value property or the grammar's default_null_value named argument is defined, a symbol's null value is a Perl undef.

A null subtree is a subtree all of whose nodes are null. Marpa prunes all null subtrees back to their topmost null node. This means that all null nodes that remain in Marpa's virtual parse tree will be leaf nodes.

The "lost" semantics of the non-root nodes of null subtrees are usually not missed. Null subtrees cannot contain token nodes, so no token nodes are lost when null subtrees are pruned. As bushy as a null subtree might be, all of its nodes are null nodes.

All null nodes correspond to zero-length strings, so we are literally dealing here with the "semantics of nothing". In theory the semantics of nothing can be arbitrarily complex. In practice it should be possible to keep them simple.

If any application ever actually needs it, Marpa could implement a complex, and even dynamic, "semantics of nothing". For details see below.

Null Sequence Nodes

Rule nodes for sequences were mentioned above. Sequence nodes can also be null nodes. This happens with sequence rules which have a min rule property of 0. Such a sequence rule can contain any number of sequence items, including zero items. When a sequence contains zero items, it must derive the zero-length string, and its node is a null node.

Sequence null nodes follow the rules for null nodes. Their value is that of a symbol -- the left hand side symbol of the nulled sequence rule.

Nullable sequence rules do have a potential to surprise. When the node for a nullable sequence rule is a null node, its semantics comes from the null value for its left hand side symbol. When the node for a nullable sequence rule is not a null node, then it is a rule node and its semantics come from the rule. It's up to the application to see that these two -- the null value and the sequence rule -- "make sense" together. One way to do this is to create a special symbol which is dedicated to service as the left hand side symbol for the sequence rule. That dedicated symbol can then be given the correct semantics.

The rules for nodes in null subtrees apply with equal force to nodes for sequence rules. In a nulled subtree, the only node whose semantics matters is the root node of that subtree. If a zero-length sequence is in a nulled subtree, and that zero-length sequence is not the root node of that subtree, its semantics will be completely ignored.

SEMANTIC PHASES

This section explains the order in which the semantics specified by an application are implemented. As a reminder, the implementation of an action requires two steps. In the first step the action is resolved to a Perl closure. In the second step that Perl closure is called.

When the semantics are applied to a parse, it produces a value called a parse result. Because Marpa allows ambiguous parsing, each parse can have zero or more parse results.

A parse series is the series of zero or more parse results for a single evaluation. The first call to the Marpa::XS::Recognizer::value method after the recognizer is created is the start of the first parse series. The first parse series continues until there is a call to the Marpa::XS::Recognizer::reset_evaluation method or until the recognizer is destroyed. Usually, an application is only interested in a single parse series.

When the Marpa::XS::Recognizer::reset_evaluation method is called for a recognizer, it begins a new parse series. The new parse series continues until there is another call to the Marpa::XS::Recognizer::reset_evaluation method, or until the recognizer is destroyed.

Recognizer Setup Phase

In the Recognizer Setup Phase, the null values of the symbols are determined. The null values of symbols never change -- they are in effect properties of the grammar.

For every Marpa Recognizer object, the Recognizer Setup Phase occurs before any Parse Series Setup Phase. The Recognizer Setup Phase occurs only once in the life of a recognizer object.

Parse Series Setup Phase

During the Parse Series Setup Phase all semantic action names are resolved to Perl closures. The semantic Perl closures are only resolved in this phase -- they will not be called until the Parse Tree Traversal Phase.

A Parse Series Setup Phase occurs in the first call of the Marpa::XS::Recognizer::value method of each parse series. Ranking action names are not resolved in the Parse Series Setup Phase.

Ranking Phase

In the Ranking Phase, all ranking actions are resolved to Perl closures, and those Perl closures are called. The Ranking Phase is the only time during which ranking actions are resolved or called.

Ranking Phases do not occur if the ranking method is left as the default. If the ranking method is set to a value other than "none", a Ranking Phase occurs in the first call of the Marpa::XS::Recognizer::value method of each parse series.

Parse Result Setup Phase

In the Parse Result Setup Phase, the per-parse variable is created. If a constructor was found for the action_object, it is run at this point, and the per-parse variable is its return value. A Parse Result Setup Phase occurs once for each parse result or, in other words, once for each call to the Marpa::XS::Recognizer::value method.

In every parse series, all the Parse Result Setup Phases occur after the Parse Series Setup Phase. In every parse series which has a Ranking Phase, all the Parse Result Setup Phases occur after the Ranking Phase. For each parse result, the Parse Result Setup Phase occurs before any of the Parse Tree Traversal Phases.

Parse Tree Traversal Phase

During the Parse Tree Traversal Phase, the semantic Perl closures are called. For each parse result, the Parse Tree Traversal Phase follows the Parse Result Setup Phase.

During parse tree traversal, Node Evaluation Time occurs for every node in the parse. A Parse Tree Traversal Phase occurs once for each call to the Marpa::XS::Recognizer::value method.

Node Evaluation Time

Node Evaluation Time is not really a separate phase. Node Evaluation Time is the Parse Tree Traversal Phase, seen from the point of view of the individual nodes of the parse tree. If a node has a semantic Perl closure, it is called at Node Evaluation Time.

SEARCHING FOR RULE SEMANTIC CLOSURES

Marpa finds the semantic Perl closure for each rule based on rule and symbol properties and on the grammar named arguments. The search for a semantic Perl closure is equivalent to following these steps:

  • Resolve the action property

    If no action property is defined for a rule, the search proceeds to the next step.

    If the action property for a rule is defined and if it resolves to a Perl closure, that Perl closure becomes the rule's semantic Perl closure, and the search ends. The resolution of action names is described below.

    If a defined action property does not successfully resolve to a closure, no further attempt to find the semantic Perl closure is made. The search ends, and an exception is thrown.

  • Resolve the lhs property

    If the lhs property for a rule resolves to a Perl closure, that Perl closure becomes the rule's semantic Perl closure, and the search ends.

    If the lhs property fails to resolve to a Perl closure, no exception is thrown. This is the one case where failure of an action name to resolve is not treated as a fatal error. The search proceeds to the next step.

  • Resolve the default_action Named Argument

    If the grammar does not have a default_action defined, the search proceeds to the next step.

    If the grammar does have a default_action defined, and if it resolves to a Perl closure, that Perl closure becomes the rule's semantic Perl closure, and the search ends.

    If a defined default_action grammar named argument does not successfully resolve to a closure, no further attempt to find the semantic Perl closure is made. The search ends, and an exception is thrown.

  • Evaluate to undef

    If no semantic Perl closure is found, the value of the rule is always a Perl undef. Marpa optimizes for this case, but the effect is the same as if the rule's semantic Perl closure was the following:

    sub { return }

RESOLVING ACTION NAMES

Action names are resolved to semantic Perl closures in the evaluator setup phase. Candidates for resolution as action names include

  • The default_action named argument of Marpa's grammar.

  • The action property of Marpa's rules.

  • The ranking_action property of Marpa's rules.

  • The new constructor in the package specified by the action_object named argument of the Marpa grammar;

  • The lhs property of Marpa's rules.

Resolution of the action object constructor is explained below as a special case.

Explicit Resolution

The closures named argument allows the user to directly control the mapping from action names to semantic Perl closures. The value of the closures named argument is a reference to a hash whose keys are action names and whose hash values are CODE refs.

If an action name is the key of an entry in the closures hash, it resolves to the closure referenced by the value part of that hash entry. Resolution via the closures named argument is called explicit resolution.

When explicit resolution is the only kind of resolution that is wanted, it is best to pick a name that is very unlikely to be the name of a Perl closure. Many of Marpa::HTML's action names are intended for explicit resolution only. In Marpa::HTML those action names begin with an exclamation mark ("!"), and that convention is recommended.

Fully Qualified Action Names

If explicit resolution fails, Marpa transforms the action name into a fully qualified Perl name. An action name that contains a double colon ("::") or a single quote ("'") is considered to be a fully qualified name. Any other action name is considered to be a bare action name.

If the action name to be resolved is already a fully qualified name, it is not further transformed. It will be resolved in the form it was received, or not at all.

For bare action names, Marpa tries to qualify them by adding a package name. If the actions grammar named argument is defined, Marpa uses it as the package name. Otherwise, if the action_object grammar named argument is defined, Marpa uses it as the package name. Once Marpa has fully qualified the action name, Marpa looks for a Perl closure with that name in the namespace.

If Marpa cannot fully qualify an action name, it will not attempt to resolve it. This means that for an action name to resolve successfully, one of these four things must be the case:

  • The actions named argument is defined.

  • The action_object named argument is defined.

  • The action name resolves explicitly.

  • The action name is fully qualified to begin with.

In all but one circumstance, failure to resolve an action name is thrown as an exception. Marpa is more lenient when it attempts to use the lhs rule property as an action name. That is the one case in which Marpa will look at other alternaties. See the section on finding rule semantic closures.

Marpa's philosophy is that asking the programmer to be specific about action names can be a slight inconvenience, but silently executing unintended code might be a major disaster. Marpa prefers action name resolution to fail when the application's intent is not clear.

Generally it's best practice to put the semantic Perl closures into their own namespace. But if, for example, the user wants to leave the semantic closures in the main namespace, she can specify "main" as the value of the actions named argument.

THE PER-PARSE VARIABLE

In the Parse Result Setup Phase, Marpa creates a per-parse variable. This becomes the first argument of the semantic Perl closures for the rule nodes. If the grammar's action_object named argument is not defined, the per-parse variable is initialized to an empty hash ref.

Most data for the rule node semantic Perl closures will be passed up the parse tree. The semantic Perl closures will see the values of their child nodes as arguments, and will return their own value to be seen as an argument by their parent node. The per-parse variable can be used for data which does not conveniently fit this model.

The lifetime of the per-parse variable extends into the Parse Tree Traversal Phase. During the Parse Tree Traversal Phase, Marpa's internals never alter the per-parse variable -- it is reserved for use by the semantic Perl closures implementing rule node semantics.

Action Object Constructor

If the grammar's action_object named argument has a defined value, that value is treated as the name of a class. The action object constructor is the new method in the action_object class.

The action object constructor is run at parse setup time. The return value of the action object constructor becomes the per-parse variable. It is a fatal error if the grammar's action_object named argument is defined, but does not name a class with a new method.

The fully qualified name of the action object constructor is the value of the action_object named argument followed by the literal string "::new". Resolution of the action object constructor is done as resolution of this fully qualified action name.

If a grammar has both the action and the action_object named arguments defined, all action names except for the action object constructor will be resolved in the action package or not at all. Only the action object constructor name will be resolved using the action_object class.

Bypass via explicit resolution applies to the action object constructor. If the fully qualified name of the action object constructor is a hash key in the evaluator's closures named argument, then the Perl closure referred to by the value of that hash entry becomes the action object constructor.

PARSE ORDER

Duplicate Parses

The same parse result is never returned twice. A parse result is considered to be a duplicate of another if both parse results apply the same rules in the same order at the same earleme locations.

Default Parse Order

By calling the Marpa::XS::Recognizer::value method repeatedly, Marpa can produce all the parse results for a given parse. The default is for the parse results to be returned in an arbitrary order. This corresponds to the "none" value of the Recognizer's ranking_method named argument.

A General Approach to Sorting Parses

The most general way to sort Marpa parses is for the application to take control. The application can set up the Marpa semantic actions so that the value of every parse result is a <rank, true_value> duple. These duples can be implemented as references to arrays. The duples can then be sorted and the rank discarded.

In the worst case, producing and sorting all the parses can take a very long time. The Marpa interface has no special features to deal with this general case. Putting this logic inside Marpa would clutter the interface, and there would be no gain in efficiency to compensate.

The "Constant" Ranking Method

Marpa does support a simplified approach to sorting parses. It is a compromise between generality and efficiency. The Constant Ranking Method is general enough to handle many, perhaps even most, practical applications, and simple enough that Marpa is able to optimize it.

The Constant Ranking Method is specified by giving the Marpa Recognizer's ranking method named argument a value of "constant".

The basic idea is to allow the user to specify constant values for rules, and to rank all other nodes according to the sum of the values of their children. Leaf nodes default to a value of 0.

The value of a rule must be "constant" in the sense that it cannot depend on the values of its children. This is a major limitation, but it greatly simplifies the logic for re-ranking parses as they are iterated. And it is less of a limitation than it may appear, because rules without ranking closures will take into account the values of their children. By strategically mixing rules which have no ranking action, but which do take into account child values, and rules which can have a user-specified rank, but which are not allowed to take into account child values, most real-life applications can accomplish what they need to.

The ranking action of a token leaf node is specified using the token symbol's ranking_action property. The ranking action of a nulled leaf node is specified using the null node symbol's ranking_action property. The ranking action of a rule is specified using the rule's ranking_action property.

Ranking actions must return a reference to the rank. Ranks must be Perl numbers. Negative values and non-integer values are allowed. The highest numeric value is considered to be the highest rank, and the lowest numeric value is considered to be the lowest rank.

As a special case, if a ranking action returns a Perl undef, Marpa will skip that possibility, rather than ranking it. Note that any "skipped" node in a parse result causes that whole parse result to be skipped. A consequence of this is that any skipped node in an unambiguous parse will result in no parse results being found.

This behavior may seem to be draconian, but in fact skipping the entire tree is probably the easiest and most natural way to deal with skipped nodes. The traditional semantics requires trees to have a full set of nodes. It is unclear at this point what purposes a semantics of partial trees would be expected to serve.

An instance of a rule is a rule, a start location, and an end location. Ranking actions are called once for each rule instance. While ranking actions return constants in the sense that they cannot be aware of the ranks of their child nodes, the rank returned can vary based on the rule's start and end location. The ranking actions can determine their location using the context-aware static methods.

For the rank of a node to be calculated, the ranking action must first be resolved to a ranking Perl closure. Ranking actions are resolved to ranking Perl closures in the Ranking Phase, using the same logic that resolves semantics actions to Perl semantic closures. The logic that resolves action names to closures is described above. The ranking actions are resolved to Perl closures, which are also called during the Ranking Phase.

Using Ranking for Side Effects

For every parse series, ranking actions are guaranteed to be called once and only once for each rule instance. As a reminder, a rule instance is a rule, together with a start and end location. This guarantee makes ranking actions useful for their side effects, even when there is no interest in changing the order of the parse results. In fact, ranking actions can be used in cases when there is no interest in evaluating the actual parse results.

For example, an application may wants to know all the instances of a particular rule that occur in any parse result. In particular, an application might be looking for ambiguities -- cases in the parse series where two different rules derive the same input string. The application can create ranking actions which have the side effect of tracking instances of these two rules by location, and use this information to detect ambiguities. If there is no interest in an actual parse, ranking actions which return undef can cause all parses to be discarded.

This is much faster than the alternative of producing all the parse results. The number of parse results grows much faster than the number of ambiguities -- it can be exponential in the length of the input.

CONTEXT-AWARE STATIC METHODS

sub rank_null_a {
    return \( ( $MyTest::MAXIMAL ? -1 : 1 )
        * 10**( 3 - Marpa::XS::token_location() ) );
}

Ranking Perl closures have available a set of context-aware static methods. A closure can use these methods to learn about the context in which it is called. As of this writing, the context-aware static methods are tested for the ranking Perl closures only. This may change in the future.

Marpa::XS::location

Returns the earleme location of the origin (or start) of the rule.

Marpa::XS::token_location

Returns the earleme location of the token. Intended for use in callbacks associated with empty rules and nulling symbols.

NULL SUBTREES

In Marpa, a null node must be leaf node. Because Marpa prunes every null subtree back to its topmost null node, none of the non-root nodes in a null subtree are represented in Marpa's virtual parse tree. Here's an example:

sub L {
    shift;
    return 'L(' . ( join q{;}, @_ ) . ')';
}

sub R {
    shift;
    return 'R(' . ( join q{;}, @_ ) . ')';
}

sub S {
    shift;
    return 'S(' . ( join q{;}, @_ ) . ')';
}

my $grammar = Marpa::XS::Grammar->new(
    {   start   => 'S',
        actions => 'main',
        rules   => [
            [ 'S', [qw/L R/] ],
            [ 'L', [qw/A B X/] ],
            [ 'L', [] ],
            [ 'R', [qw/A B Y/] ],
            [ 'R', [] ],
            [ 'A', [] ],
            [ 'B', [] ],
            [ 'X', [] ],
            [ 'Y', [] ],
        ],
        symbols        => {
            L => { null_value => 'null L' },
            R => { null_value => 'null R' },
            A => { null_value => 'null A' },
            B => { null_value => 'null B' },
            X => { null_value => 'null X', terminal => 1 },
            Y => { null_value => 'null Y', terminal => 1 },
        },
    }
);

$grammar->precompute();

my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );

$recce->tokens( [ [ 'X', 'x' ], ] );

If we write the unpruned parse tree one node per line in pre-order, depth-first, indenting children below their parents, we get something like this:

0: Rule Node, Rule: S := L R
     1: Rule Node, Rule L := A B X
         1.1: Null Node, Symbol A
         1.2: Null Node, Symbol B
         1.3: Token Node, Token value is 'x'
     2: Rule Node, Rule R := A B Y
         2.1: Null Node, Symbol A
         2.2: Null Node, Symbol B
         2.3: Null Node, Symbol Y

In this example, six nodes are nulled. Four of them are in a single subtree: 2, 2.1, 2.2 and 2.3. Marpa prunes every null subtree back to its null root node, which in this case is the node numbered 2.

The pruned tree looks like this

0: Rule Node, Rule: S := L R
     1: Rule Node, Rule L := A B X
         1.1: Null Node, Symbol A
         1.2: Null Node, Symbol B
         1.3: Token Node, Token value is 'x'
     2: Null Node, Symbol R

Here is the output:

S(L(null A;null B;x);null R)

In the output we see

  • The null value for node 1.1: "null A".

  • The null value for node 1.2: "null B".

  • The token value for node 1.3: "x".

  • An application of the semantic Perl closure for node 1.

  • The null value for node 2: "null R".

  • An application of the semantic Perl closure for rule node 0.

We do not see any output for nodes 2.1, 2.2, or 2.3 because they were non-root nodes in the pruned subtree. We do see the null value for node 2, because after pruning it is a leaf node. We do not see an application of the semantic Perl closure for node 2, because after pruning, node 2 is not a rule node.

The Semantics of Nothing

Rarely, your application may call for a complex semantics of nothing. If the semantics of nothing, while complex, remains constant, you can handle it setting every nullable symbol's null_value property to the value which your semantics produces when that nullable symbol is the root symbol of a null subtree.

If the values in your "semantics of nothing" are not constants, Marpa can still calculate them. Determine which of your nullable symbols have a dynamic semantics. Call these your dynamic nullables. Let the null_value property of every dynamic nullable be a hash key. For every rule with a dynamic nullable on its right hand side, write the rule's semantic Perl closure so that it maps that hash key to a Perl closure, which it then runs to calculate the value of the dynamic nullable.

INFINITELY AMBIGUOUS GRAMMARS

Marpa will parse using an infinitely ambiguous grammar. (In the technical literature, an infinite ambiguity is more usually called a cycle.)

An example of an infinitely ambiguous grammar is the following:

S ::= A
A ::= B
B ::= A
B :: 'x'

Given the input 'x', this grammar will produce these parses

S -> A -> B -> x
S -> A -> B -> A -> B -> x
S -> A -> B -> A -> B -> A -> B -> x
.
.
.

Because of the two rules A ::= B and B ::= A, this list of parses could go on forever. The two rules A ::= B and B ::= A form what is called a cycle.

Typically, if a user has written an grammar with an infinite cycle, it was a mistake and he wants to rewrite it before proceeding. By default, an infinitely ambiguous grammar is a fatal error. This is the behavior most users will want.

To proceed to producing parse results from an infinitely ambiguous grammar, the user must set the grammar's infinite action named argument to a value other than "fatal". The other choices are "warn" and "quiet".

Cycle Length

Obviously, Marpa cannot list all of an infinite number of parse results. Marpa deals with potentially infinite parses by limiting the cycle length. Cycle length is the number of times a parse derivation goes around a potentially infinite cycle.

Marpa limits all cycles to a length of 1. There will always be a finite number of these parse results.

Above I showed a set of parses from an example of an infinitely ambiguous grammar. Here are those parses again, this time labeled with their cycle length.

Cycle length 1: S -> A -> B -> x
Cycle length 2: S -> A -> B -> A -> B -> x
Cycle length 3: S -> A -> B -> A -> B -> A -> B -> x

Of the parse results in the above list, Marpa would return a value only for the first, the one whose cycle length is 1.

Caveats

The precise behavior of Marpa's cycle detection is, at this point, defined somewhat loosely. In Marpa cycle is a return to the "same rule", with the same start and end earlemes. But as of this writing the "same rule" means the same rule after Marpa's rewriting of the grammar, not the same rule as in the original grammar.

Since, in all other cases, the semantics hides Marpa's rewritings, the precise cutoff points of cycles can seem arbitrary and unnatural. The CHAF rewrite, for example, breaks rules up, and a CHAF rewrite can cause a cycle to end before it would end in terms of the rules of the original grammar.

Marpa's exact definition of a cycle is experimental and subject to change. In particular, it would be unwise to have an application's semantics rely in detail on Marpa's present behavior in determining the content and length of cycles.

Marpa's cycle detection could be more carefully defined, and/or made to follow the original grammar. But as far as I know nobody cares about these details. The present implementation I believe matches or exceeds the extent of present interest.

LICENSE AND COPYRIGHT

Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.