NAME
Parse::Marpa::Evaluator - Marpa Evaluator Objects
SYNOPSIS
my $fail_offset = $recce->text( \('2-0*3+1') );
if ( $fail_offset >= 0 ) {
die("Parse failed at offset $fail_offset");
}
my $evaler = new Parse::Marpa::Evaluator($recce);
die("Parse failed") unless $evaler;
for ( my $i = 0; defined( my $value = $evaler->value() ); $i++ ) {
if ( $i > $#expected ) {
fail( "Ambiguous equation has extra value: " . $$value . "\n" );
}
else {
is( $$value, $expected[$i], "Ambiguous Equation Value $i" );
}
}
DESCRIPTION
Parses are found and evaluated by Marpa's evaluator objects. Evaluators are created with the new
constructor, which requires a Marpa recognizer object.
Marpa allows ambiguous parses, so evaluator objects are iterators. Iteration is performed with the next
method, which returns a reference to the value of the next parse. Often only one parse is needed, in which case the next
method is called only once.
Each Marpa recognizer should have only one evaluator using it at any one time. If multiple evaluators use the same recognizer at the same time, they may produce incorrect results.
Null Values
A "null value" is the value used for a symbol's value when it is nulled in a parse. By default, the null value is a Perl undefined. If you want something else, the default null value is a Marpa option (default_null_value
) and can be reset.
Each symbol can have its own null symbol value. The null symbol value for any symbol is calculated using the action specified for the empty rule which has that symbol as its left hand side. The null symbol action is not a rule action. It's a property of the symbol, and applies whenever the symbol is nulled, even when the symbol's empty rule is not involved.
For example, in MDL, the following says that whenever the symbol A
is nulled, its value should be a string that says it is missing.
A: . q{'A is missing'}.
Null symbol actions are different from rule actions in another important way. Null symbol actions are run at recognizer creation time and the value of the result at that point becomes fixed as the null symbol value. This is different from rule actions. During the creation of the recognizer object, rule actions are compiled into closures. During parse evaluation, whenever a node for that rule needs its value recalculated, the compiled rule closure is run. A compiled rule closure can produce a different value every time it runs.
I treat null symbol actions differently for efficiency. They have no child values, and a fixed value is usually what is wanted. If you want to calculate a symbol's null value with a closure run at parse evaluation time, the null symbol action can return a reference to a closure. Rules with that nullable symbol in their right hand side can then be set up so that they run that closure.
Evaluating Null Derivations
A null derivation may consist of many steps and may contain many symbols. Marpa's rule is that the value of a null derivation is the null symbol value of the highest null symbol in that derivation. This section describes in detail how a parse is evaluated, focusing on what happens when nulled symbols are involved.
The first step in evaluating a parse is to determine which nodes count for the purpose of evaluation, and which do not. Marpa follows these principles:
The start node always counts.
Nodes count if they derive a non-empty sentence.
All other nodes do not count.
In evaluating a parse, Marpa uses only nodes that count.
These are all consequences of the principles above:
The value of null derivation is the value of the highest null symbol in it.
A nulled node counts only if it is start node.
The value of a null parse is the null value of the start symbol.
If you think some of the rules or symbols represented by nodes that don't count are important in your grammar, Marpa can probably accommodate your ideas. First, for every nullable symbol, determine how to calculate the value which your semantics produces when that nullable symbol is a "highest null symbol". If it's a constant, write a null action for that symbol which returns that constant. If your semantics do not produce a constant value by recognizer creation time, write a null action which returns a reference to a closure and arrange to have that closure run by the parent node.
Example
Suppose a grammar has these rules
S: A, Y. q{ $_[0] . ", but " . $_[1] }. # Call me the start rule
note: you can also call me Rule 0.
A: . q{'A is missing'}. # Call me Rule 1
A: B, C. q{"I'm sometimes null and sometimes not"}. # Call me Rule 2
B: . q{'B is missing'}. # Call me Rule 3
C: . q{'C is missing'}. # Call me Rule 4
C: Y. q{'C matches Y'}. # Call me Rule 5
Y: /Z/. q{'Zorro was here'}. # Call me Rule 6
In the above MDL, the Perl 5 regex "/Z/
" occurs on the rhs of Rule 6. Where a regex is on the rhs of a rule, MDL internally creates a terminal symbol to match that regex in the input text. In this example, the MDL internal terminal symbol that matches input text using the regex /Z/
will be called Z
.
If the input text is the Perl 5 string "Z
", the derivation is as follows:
S -> A Y (Rule 0)
-> A Z (Y produces Z, by Rule 6)
-> B C Z (A produces B C, by Rule 2)
-> B Z (C produces the empty string, by Rule 4)
-> Z (B produces the empty string, by Rule 3)
The parse tree can be described as follows:
Node 0 (root): S (2 children, nodes 1 and 4)
Node 1: A (2 children, nodes 2 and 3)
Node 2: B (matches empty string)
Node 3: C (matches empty string)
Node 4: Y (1 child, node 5)
Node 5: Z (terminal node)
Here's a table showing, for each node, its lhs symbol, the sentence it derives, and its value.
Symbol Sentence Value
Derived
Node 0: S Z "A is missing, but Zorro is here"
Node 1: A empty "A is missing"
Node 2: B empty No value
Node 3: C empty No value
Node 4: Y Z "Zorro was here"
Node 5: Z Z "Z"
In this derivation, nodes 1, 2 and 3 derive the empty sentence. None of them are the start node so that none of them count.
Nodes 0, 4 and 5 all derive the same non-empty sentence, Z
, so they all count. Node 0 is the start node, so it would have counted in any case.
Since node 5 is a terminal node, it's value comes from the lexer. Where the lexing is done with a Perl 5 regex, the value will be the Perl 5 string that the regex matched. In this case it's the string "Z
".
Node 4 is not nulled, so it is evaluated normally, using the rule it represents. That is rule 6. The action for rule 6 returns "Zorro was here
", so that is the value of node 4. Node 4 has a child node, node 5, but rule 6's action pays no attention to child values. The action for each rule is free to use or not use child values.
Nodes 1, 2 and 3 don't count and will all remain unevaluated. The only rule left to be evaluated is node 0, the start node. It is not nulled, so its value is calculated using the action for the rule it represents (rule 0).
Rule 0's action uses the values of its child nodes. There are two child nodes and their values are elements 0 and 1 in the @$_
array of the action. The child value represented by the symbol Y
, $_->[1]
, comes from node 4. From the table above, we can see that that value was "Zorro was here
".
The first child value is represented by the symbol A
, which is nulled. For nulled symbols, we must use the null symbol value. Null symbol values for each symbol can be explicitly set by specifying an rule action for an empty rule with that symbol as its lhs. For symbol A
, this was done in Rule 1. Rule 1's action evaluates to the Perl 5 string "A is missing
".
Even though rule 1's action plays a role in calculating the value of this parse, rule 1 is not actually used in the derivation. No node in the derivation represents rule 1. Rule 1 is used because it defines the null symbol value for the symbol A
.
Now that we have both child values, we can use rule 0's action to calculate the value of node 0. That value is "A is missing, but Zorro was here
", This becomes the value of S
, rule 0's left hand side symbol and the start symbol of the grammar. A parse has the value of its start symbol, so "A is missing, but Zorro was here
" is also the value of the parse.
METHODS
new
my $evaler = new Parse::Marpa::Evaluator($recce);
my $evaler = new Parse::Marpa::Evaluator($recce, $location);
Creates an evaluator object. On success, returns the evaluator object. Failures are thrown as exceptions.
The first, required, argument is a recognizer object. The second, optional, argument will be used as the number of the earleme at which to end parsing. If there is no second argument, parsing ends at the default end of parsing, which was set in the recognizer.
next
my $result = $evaler->value();
Iterates the evaluator object, returning a reference to the value of the next parse. If there are no more parses, returns undefined. Successful parses may evaluate to a Perl 5 undefined, which the next
method will return as a reference to an undefined. Failures are thrown as exceptions.
Parses are iterated in postorder. If a symbol can both match a token and derive a rule, the token match takes priority. Other alternatives are taken in implementation dependent order. When the order is important, it may be manipulated by assigning priorities to the rules and terminals.
A failed parse does not always show up as an exhausted parse in the recognizer. Just because the recognizer was active when it was used to create the evaluator, does not mean that the input matches the grammar. If it does not match, there will be no parses and the next
method will return undefined the first time it is called.
SUPPORT
See the support section in the main module.
AUTHOR
Jeffrey Kegler
LICENSE AND COPYRIGHT
Copyright 2007 - 2008 Jeffrey Kegler
This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0.