NAME

Parse::Marpa::Evaluator - Marpa Evaluator Objects

SYNOPSIS

    my $grammar = new Parse::Marpa::Grammar({ mdl_source => \$mdl });
    my $recce = new Parse::Marpa::Recognizer({ grammar => $grammar });
    my $fail_offset = $recce->text(\("2-0*3+1"));
    croak("Parse failed at offset $fail_offset") if $fail_offset >= 0;

    my $evaler = new Parse::Marpa::Evaluator($recce);

    for (my $i = 0; defined(my $value = $evaler->next()); $i++) {
        croak("Ambiguous parse has extra value: ", $$value, "\n")
	    if $i > $expected;
	say "Ambiguous Equation Value $i: ", $$value;
    }

DESCRIPTION

Parses are found and evaluated by Marpa's evaluator objects. Evaluators are created with the new constructor, which requires a Marpa recognizer object.

Marpa allows ambiguous parses, so evaluator objects are iterators. Iteration is performed with the next method, which returns a reference to the value of the next parse. Often only one parse is needed, in which case the next method is called only once.

The evaluator does its work in tables kept in the recognizer object. Each Marpa recognizer should have only one evaluator using it at any one time. If multiple evaluators use the same recognizer at the same time, they may produce incorrect results.

Node Memoization

If you accept Marpa's default behavior, there will be no node memoization and you can safely ignore the question of opacity. But when multiple parses are evaluated in a parse with expensive rule actions, the boost in efficiency from node value memoization can be major.

If the defaults are used, Marpa will mark all its evaluators opaque. That is always a safe choice. Grammars and recognizers can be marked opaque or transparent.

A recognizer created from an opaque grammar is marked opaque. A recognizer created from a grammar marked transparent is marked transparent, unless an opaque named argument marks the recognizer opaque at recognizer creation time. A recognizer created from a grammar without a opacity marking is marked opaque, unless an opaque named argument marks the recognizer transparent at recognizer creation time. Once a recognizer object has been created, its opacity setting cannot be changed.

If a user decides to mark an object transparent, it is up to her to make sure all the evaluators created from that object are safe for node memoization. An evaluator is safe for node memoization only if all of its nodes are safe for memoization. Node values are computed by Perl 5 code, and node memoization follows the same principles as function memoization. For a node to be safe for memoization, it must be transparent. A node is transparent only if it is always safe to substitute a constant value for the node's action.

Here are guidelines for making actions transparent: The code must have no side effects. The return value should not be a reference. The return value must depend completely on the return values of the child nodes. All child nodes must be transparent as well. Any subroutines or functions must be transparent.

Exceptions to these guidelines can be made, but you have to know what you're doing. There's an excellent discussion of memoization in Mark Jason Dominus's Higher Order Perl. If you're not sure whether your semantics are opaque or not, just accept Marpa's default behavior. Also, it is always safe to mark a grammar or a recognizer opaque.

Marpa will sometimes mark grammars opaque on its own. Marpa often optimizes the evaluation of sequence productions by passing a reference to an array between the nodes of the sequence. This elminates the need to repeatedly copy the array of sequence values as it is built. That's a big saving, especially if the sequence is long, but the reference to the array is shared data, and the changes to the array are side effects.

Once an object has been marked opaque, whether by Marpa itself or the user, Marpa throws an exception if there is an attempt to mark it transparent. Resetting a grammar to transparent will almost always be an oversight, and one that would be very hard to debug.

Null Values

A "null value" is the value used for a symbol's value when it is nulled in a parse. By default, the null value is a Perl undefined. If you want something else, the default null value is a Marpa option (default_null_value) and can be reset.

Each symbol can have its own null symbol value. The null symbol value for any symbol is calculated using the action specified for the empty rule which has that symbol as its left hand side. The null symbol action is not a rule action. It's a property of the symbol, and applies whenever the symbol is nulled, even when the symbol's empty rule is not involved.

For example, in MDL, the following says that whenever the symbol A is nulled, its value should be a string which expresses surprise.

A: . q{ 'Oops!  Where did I go!' }.

Null symbol actions are different from rule actions in another important way. Null symbol actions are run at recognizer creation time and the value of the result at that point becomes fixed as the null symbol value. This is different from rule actions. During the creation of the recognizer object, rule actions are compiled into closures. During parse evaluation, whenever a node for that rule needs its value recalculated, the compiled rule closure is run. A compiled rule closure can produce a different value every time it runs.

I treat null symbol actions differently for efficiency. They have no child values, and a fixed value is usually what is wanted. If you want to calculate a symbol's null value with a closure run at parse evaluation time, the null symbol action can return a reference to a closure. Rules with that nullable symbol in their right hand side can then be set up so that they run that closure.

Evaluating Null Derivations

A null derivation may consist of many steps and may contain many symbols. Marpa's rule is that the value of a null derivation is the null symbol value of the highest null symbol in that derivation. This section describes in detail how a parse is evaluated, focusing on what happens when nulled symbols are involved.

The first step in evaluating a parse is to determine which nodes count for the purpose of evaluation, and which do not. Marpa follows these principles:

  1. The start node always counts.

  2. Nodes count if they derive a non-empty sentence.

  3. All other nodes do not count.

  4. In evaluating a parse, Marpa uses only nodes that count.

These are all consequences of the principles above:

  1. The value of null derivation is the value of the highest null symbol in it.

  2. A nulled node counts only if it is start node.

  3. The value of a null parse is the null value of the start symbol.

If you think some of the rules or symbols represented by nodes that don't count are important in your grammar, Marpa can probably accommodate your ideas. First, for every nullable symbol, determine how to calculate the value which your semantics produces when that nullable symbol is a "highest null symbol". If it's a constant, write a null action for that symbol which returns that constant. If your semantics do not produce a constant value by recognizer creation time, write a null action which returns a reference to a closure and arrange to have that closure run by the parent node.

Example

Suppose a grammar has these rules

S: A, Y. q{ $_->[0] . ", but " . $_->[1] }. # Call me the start rule
note: you can also call me Rule 0.

A: . q{'A is missing'}. # Call me Rule 1

A: B, C. q{"I'm sometimes null and sometimes not"}. # Call me Rule 2

B: . q{'B is missing'}. # Call me Rule 3

C: . q{'C is missing'}. # Call me Rule 4

C: Y.  q{'C matches Y'}. # Call me Rule 5

Y: /Z/. q{'Zorro was here'}. # Call me Rule 6

In the above MDL, the Perl 5 regex "/Z/" occurs on the rhs of Rule 6. Where a regex is on the rhs of a rule, MDL internally creates a terminal symbol to match that regex in the input text. In this example, the MDL internal terminal symbol that matches input text using the regex /Z/ will be called Z.

If the input text is the Perl 5 string "Z", the derivation is as follows:

S -> A Y        (Rule 0)
  -> A Z      (Y produces Z, by Rule 6)
  -> B C Z    (A produces B C, by Rule 2)
  -> B Z      (C produces the empty string, by Rule 4)
  -> Z        (B produces the empty string, by Rule 3)

The parse tree can be described as follows:

    Node 0 (root): S (2 children, nodes 1 and 4)
        Node 1: A (2 children, nodes 2 and 3)
	    Node 2: B (matches empty string)
	    Node 3: C (matches empty string)
	Node 4: Y (1 child, node 5)
	    Node 5: Z (terminal node)

Here's a table showing, for each node, its lhs symbol, the sentence it derives, and its value.

                      Symbol      Sentence     Value
                                  Derived

    Node 0:              S         Z           "A is missing, but Zorro is here"
        Node 1:          A         empty       "A is missing"
	    Node 2:      B         empty       No value
	    Node 3:      C         empty       No value
	Node 4:          Y         Z           "Zorro was here"
	    Node 5:      Z         Z           "Z"

In this derivation, nodes 1, 2 and 3 derive the empty sentence. None of them are the start node so that none of them count.

Nodes 0, 4 and 5 all derive the same non-empty sentence, Z, so they all count. Node 0 is the start node, so it would have counted in any case.

Since node 5 is a terminal node, it's value comes from the lexer. Where the lexing is done with a Perl 5 regex, the value will be the Perl 5 string that the regex matched. In this case it's the string "Z".

Node 4 is not nulled, so it is evaluated normally, using the rule it represents. That is rule 6. The action for rule 6 returns "Zorro was here", so that is the value of node 4. Node 4 has a child node, node 5, but rule 6's action pays no attention to child values. The action for each rule is free to use or not use child values.

Nodes 1, 2 and 3 don't count and will all remain unevaluated. The only rule left to be evaluated is node 0, the start node. It is not nulled, so its value is calculated using the action for the rule it represents (rule 0).

Rule 0's action uses the values of its child nodes. There are two child nodes and their values are elements 0 and 1 in the @$_ array of the action. The child value represented by the symbol Y, $_->[1], comes from node 4. From the table above, we can see that that value was "Zorro was here".

The first child value is represented by the symbol A, which is nulled. For nulled symbols, we must use the null symbol value. Null symbol values for each symbol can be explicitly set by specifying an rule action for an empty rule with that symbol as its lhs. For symbol A, this was done in Rule 1. Rule 1's action evaluates to the Perl 5 string "A is missing".

Even though rule 1's action plays a role in calculating the value of this parse, rule 1 is not actually used in the derivation. No node in the derivation represents rule 1. Rule 1 is used because it defines the null symbol value for the symbol A.

Now that we have both child values, we can use rule 0's action to calculate the value of node 0. That value is "A is missing, but Zorro was here", This becomes the value of S, rule 0's left hand side symbol and the start symbol of the grammar. A parse has the value of its start symbol, so "A is missing, but Zorro was here" is also the value of the parse.

METHODS

new

my $evaler = new Parse::Marpa::Evaluator($recce);

my $evaler = new Parse::Marpa::Evaluator($recce, $location);

Creates an evaluator object. On success, returns the evaluator object. Failures are thrown as exceptions.

The first, required, argument is a recognizer object. The second, optional, argument will be used as the number of the earleme at which to end parsing. If there is no second argument, parsing ends at the default end of parsing, which was set in the recognizer.

next

my $value = $evaler->next();

Iterates the evaluator object, returning a reference to the value of the next parse. If there are no more parses, returns undefined. Successful parses may evaluate to a Perl 5 undefined, which the next method will return as a reference to an undefined. Failures are thrown as exceptions.

Parses are iterated from rightmost to leftmost. The parse order may be manipulated by assigning priorities to the rules and terminals.

A failed parse does not always show up as an exhausted parse in the recognizer. Just because the recognizer was active when it was used to create the evaluator, does not mean that the input matches the grammar. If it does not match, there will be no parses and the next method will return undefined the first time it is called.

IMPLEMENTATION

Marpa, if a grammar optimizes sequences by returning references, marks the grammar opaque. Marpa gives optimization of sequences priority over memoization of node values because node value memoization has no payoff unless multiple parses are evaluated, which is not the usual case. Optimization of sequence evaluation almost always pays off quickly and handsomely.

A possible future extension is to enable the user to label only particular rules opaque, so that node memoization can take place on a rule by rule basis. But there's something to be said for keeping things simple. If a grammar writer is really looking for speed, she can let the grammar default to opaque, then use side effects and targeted caching and memoization.

SUPPORT

See the support section in the main module.

AUTHOR

Jeffrey Kegler

COPYRIGHT

Copyright 2007 - 2008 Jeffrey Kegler

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.