NAME
Marpa::Evaluator - Marpa Evaluator Objects
SYNOPSIS
my $evaler = Marpa::Evaluator->new( { recce => $ambiguous_recce } );
my @values = ();
if ($evaler) {
while ( defined( my $ambiguous_value_ref = $evaler->value() ) ) {
push @values, ${$ambiguous_value_ref};
}
}
DEPRECATED
This interface is now deprecated in favor of the 3rd Generation evaluator which is part of the <Marpa Recognizer|Marpa::Recognizer>. It may be removed in the future and, when XS support is added to Marpa.
The 3rd Generation interface benefits from the design lessons learned from the previous two generations. It is faster, more powerful and less complicated, both internally and externally.
Marpa is alpha software, and at this point is experimental, so hopefully no major or production applications depended on the 2nd Generation interface.
CONVERTING TO THE 3rd GENERATION EVALUATOR
Conversion of existing code should be simple.
Ranking Parses
During the life of the 2nd Generation evaluator, the default parse order varied. In the 3rd Generation evaluator, it is 'none'. This means that the parse_order
named argument of the <Marpa Recognizer|Marpa::Recognizer> must be set to a value other than "none
", or all ranking actions will be silently ignored.
The ranking actions must return a reference to the rank, not the rank itself. This change from the 2nd Generation is to allow the ranking actions to specify that a choice is to be ignored by returning an undefined.
DESCRIPTION
Marpa::Evaluator
objects implement Marpa's Multi-parse Evaluator. Marpa has another Evaluator, its Single Parse Evaluator, which can handle all unambiguous and many ambiguous parses, and is simpler and faster. If you're not sure which to use, see "THE SINGLE PARSE EVALUATOR" in Marpa.
A Multi-parse Evaluator is created with the Marpa::Evaluator::new method. The user can then iterate through the parse results, using the Marpa::Evaluator::value method.
Parse Order
The Multi-parse Evaluator will produce all the parse results for a given parse, and allows the user to control the parse order. The method used to order parse results is specified with Marpa::Evaluator
's "parse_order" named argument. As of this writing, two values are supported for the "parse_order" named argument: none
and numeric
.
Parses results are returned in an order determined by node rank. The rules are
Rule 1: In determining the order of parse results, the parse tree is traversed in pre-order, left to right.
Rule 2: The same parse result is never returned twice. Two parse results are considered "the same" if they apply the same rules in the same order at the same earleme locations.
Rule 3: At each point of ambiguity, the choice with the highest rank is made, unless that choice would violate Rule 2.
"none"
When the parse order is "none", the node rank is arbitrary and the parse results will be iterated in arbitrary order. This is the most efficient alternative, and is the default.
"numeric"
The "numeric
" parse order allows user control of the order in which the value method iterates through the parse results. Numeric parse order assigns a Perl numeric value to each leaf node and rule node of the parse. This numeric value becomes the node's rank. Negative values are allowed. The highest numeric value is the highest rank, the lowest numeric value is the lowest rank, and so forth.
By default,
The rank of every leaf node is 0.
The rank of every rule node is the sum of the ranks of its child nodes.
The defaults can be changed by assigning ranking actions. The ranking action of a token leaf node is specified using the token symbol's ranking_action
property. The ranking action of a nulled leaf node is specified using the null node symbol's ranking_action
property. The ranking action of a rule is specified using the rule's ranking_action
property.
For the rank of a node to be calculated, the ranking action must first be resolved to a ranking Perl closure. Ranking actions are resolved to ranking Perl closures in the evaluator setup phase, using the same logic that resolves semantics actions to Perl semantic closures. The logic that resolves action names to closures is described in detail in the semantics document.
The ranking closures for token leaf nodes and the ranking closures for nulled leaf nodes are called in the bocage setup phase. The ranking closures for rule nodes are called during the parse setup phase.
Infinitely Ambiguous Grammars
The fast evaluator does not allow an infinitely ambigious grammar, but the Multi-parse Evaluator does. An example of an infinitely ambiguous grammar is the following:
S ::= A
A ::= B
B ::= A
B :: 'x'
Given the input 'x', this grammar will produce these parses
S -> A -> B -> x
S -> A -> B -> A -> B -> x
S -> A -> B -> A -> B -> x -> A -> B -> x
.
.
.
Because of the two rules A ::= B
and B ::= A
, this list of parses could go on forever. The two rules A ::= B
and B ::= A
form what is called a cycle.
Typically, if a user has written an grammar with an infinite cycle, it was a mistake and he wants to rewrite it before proceeding. By default, an infinitely ambiguous grammar is a fatal error. This is the behavior most users will want.
If a user does want to parse with an infinitely ambiguous grammar, Marpa allows it. The details are described below.
CONSTRUCTOR
new
my $evaler = Marpa::Evaluator->new( { recce => $ambiguous_recce } );
The new
method's arguments are one or more hashes of named arguments. On success, the new
method returns a new evaluator object. If the new
method finds that there is no parse of the input according to the grammar, it returns undefined. For other failures, the new
method throws an exception.
A unsuccessful return from the new
method means that there is no parse of the input according to the grammar, but the opposite is not necessarily true. It is possible that the new
method will successfully return an evaluator object, only for the value
method to return undefined the first time it is called, indicating that there are no parse results. The number of parse results, including whether there are any at all, is not known until they are iterated by the value
method.
The recognizer
named argument is required, and its value must be a recognizer object which has finished recognizing a text. The recce
named argument is a synonym for the recognizer
named argument. Other named arguments are described below.
MUTATORS
set
$evaler->set( { max_parses => 300 } );
The set
allows Marpa named arguments to be specified for an evaluator object after its creation. The arguments to the set
method are zero or more hashes of named arguments.
value
my @values = ();
if ($evaler) {
while ( defined( my $ambiguous_value_ref = $evaler->value() ) ) {
push @values, ${$ambiguous_value_ref};
}
}
Iterates the evaluator object, returning a reference to the value of the next parse result. A Perl undef
is a valid parse result value, and is returned as a reference to an undef
.
If there are no more parse results, returns undefined. If the input did not match the grammar, the value
method will return undefined the first time it is called. Failures are thrown as exceptions.
TRACE ACCESSORS
show_bocage
my $show_bocage_output = $evaler->show_bocage(2);
Returns a multi-line string describing the bocage for an evaluator. The first line contains the count of the parse results produced up to this point from the bocage. The bocage follows in the form of a parse bocage grammar, listed in pre-order.
Parse bocage grammars are similar to parse forest grammars. Parse bocage grammars are described at length in a separate document, using an example output from show_bocage
.
The optional verbosity argument must be an integer greater than or equal to zero. For each parse bocage and-production, if verbosity is a number greater than zero, Marpa shows the LR(0) item corresponding to the and-production's and-node.
If verbosity is less than 2, Marpa shows only the and-productions. If verbosity is 2 or more, Marpa also shows the or-productions. Or-productions contain only redundant information, but the parse bocage grammar is not formally complete without them.
CONTEXT-AWARE STATIC METHODS
sub rank_null_a {
return ( $MyTest::MAXIMAL ? -1 : 1 )
* 10**( 3 - Marpa::token_location() );
}
The ranking Perl closures that run at node evaluation time have available a set of context-aware static methods. A closure can use these methods to learn about the context in which it is called. As of this writing, the context-aware static methods are available to the ranking Perl closures only -- the semantic Perl closures cannot use them. This is an arbitrary restriction, and may be lifted in the future.
Marpa::location
Returns the earleme location of the origin (or start) of the rule.
Marpa::token_location
Returns the earleme location of the token. Intended for use in callbacks associated with empty rules and nulling symbols.
NAMED ARGUMENTS
closure
The Multi-parse Evaluator's closure
named argument is a reference to a hash. In the key/value pairs of this hash, the key must be an action name. The hash value must be a CODE ref.
Sources of action names include
The
action
properties of rulesThe
default_action
named argument of grammarsThe
lhs
properties of rulesThe
ranking_action
properties of rulesThe
ranking_action
properties of symbolsFor its
new
method, theaction_object
named argument of grammars
When an action name is a key in the closure
named argument, the usual action resolution mechanism of the semantics is bypassed. A common use of the closure
named argument is to allow anonymous subroutines to be semantic actions. For more details, see the document on semantics.
end
The Multi-parse Evaluator's end
named argument specifies the parse end location. The default is for the parse to end where the input did, so that the parse returned is of the entire input.
infinite_rewrite
Marpa handles infinitely ambiguous grammars, and allows two strategies for dealing with them. If the infinite_rewrite
named argument is true, when Marpa detects an infinitely ambiguous grammar, Marpa will rewrite the parse bocage to eliminate infinite cycles. The rewritten parse bocage will be semantically equivalent to the original. This rewrite happens in the evaluator setup phase, during the call to Marpa::Evaluator::new
. Rewriting to eliminate infinite cycles is the default behavior.
If the infinite_rewrite
named argument is false, there will be no rewrite to eliminate infinite cycles. Instead, infinite cycles will be dealt with dynamically, as they arise in the parse setup phase during the calls to Marpa::Evaluator::value
. For more detail, see the section on infinitely ambiguous grammars.
max_parses
The value must be an integer. If it is greater than zero, the evaluator will return no more than that number of parse results. If it is zero, there will be no limit on the number of parse results returned. The default is for there to be no limit.
Marpa allows extremely ambiguous grammars. The user may write a highly ambiguous grammar by mistake, or as a deliberate choice of parsing strategy for an application. When the ambiguity is a mistake, specifying max_parses
is useful to limit CPU usage and output length while the issue is being debugged. When ambiguity is a deliberate strategy, it is often the case that the user only wants to see the first few parse results.
parse_order
The value must be a string: either "none
" or "numeric
". When the value is "none
", Marpa returns the parse results in arbitrary order. When the value is "numeric
", Marpa allows the user to control the order in which parse results are returned by specifying ranking actions. The default is for parse results to be returned in arbitrary order. For details, see the section on parse order.
recce
A named argument whose value is the recognizer that the Evaluator is to be created from. This named argument is required for the Marpa::Evaluator::new call. The recce
named argument is not allowed for the Marpa::Evaluator::set call.
recognizer
A synonym for the "recce" named argument.
trace_actions
The Multi-parse Evaluator's trace_actions
named argument is a boolean. If the boolean value is true, Marpa traces the resolution of action names to Perl closures. A boolean value of false turns tracing off, which is the default. Traces are written to the trace file handle.
trace_evaluation
The Multi-parse Evaluator's trace_actions
named argument is a boolean. If the boolean value is true, Marpa prints messages that trace the rewriting of the parse bocage in the evaluator setup phase. A boolean value of false turns tracing off, which is the default. Traces are written to the trace file handle.
trace_file_handle
The value is a file handle. Traces and warning messages go to the trace file handle. By default the trace file handle is inherited from the recognizer used to create the evaluator.
trace_values
The Multi-parse Evaluator's trace_values
named argument is a numeric trace level. If the numeric trace level is 1, Marpa traces values as they are computed in the evaluation stack. A trace level of 0 turns value tracing off, which is the default. Traces are written to the trace file handle.
INFINITELY AMBIGUOUS GRAMMARS: THE DETAILS
This section describes the details of parsing with infinitely ambiguous grammars. Most readers will want to skip this section.
Experts in parsing theory will notice that my terminology is non-standard. What I am calling a potentially infinite cycle or an infinite cycle is usually called just a cycle in the literature, because all cycles allow a potentially infinite number of parse results from a single input. I use the more verbose terms because most readers will not be familiar with parsing theory, and might confuse cycles with ordinary recursion. Cycles result in infinite ambiguity, and to date have not found use in grammars which are of practical interest. Ordinary non-cyclical recursions, on the other hand, occur frequently in grammars of practical interest, do not result in infinite ambiguity, and are handled by Marpa without the need to take special measures.
By default, Marpa treats an infinitely ambigious grammar as a fatal error. This is because infinite ambiguity is usually a mistake, and one that is easily fixed. To proceed to producing parse results from an infinitely ambiguous grammar, the user must set the grammar's infinite action named argument to a value other than "fatal
". The other choices are "warn
" and "quiet
".
Cycle Length
Obviously, Marpa cannot evaluate an infinitely long parse, and Marpa cannot list all of an infinite number of parse results. For grammars with potentially infinite cycles, Marpa's Multi-parse Evaluator returns only those parse results where the cycle length is 1. There will always be a finite number of these parse results.
Cycle length is the number of times a parse derivation goes around a potentially infinite cycle. Above, a set of parses from an example of an infinitely ambiguous grammar was shown. Here are those parses again, this time labeled with their cycle length.
Cycle length 1: S -> A -> B -> x
Cycle length 2: S -> A -> B -> A -> B -> x
Cycle length 3: S -> A -> B -> A -> B -> x -> A -> B -> x
Of the parse results in the above list, Marpa would return a value only for the first, the one whose cycle length is 1.
When iterating parse results for an infinitely ambiguous grammar, two strategies are available to the user. Which is used depends on the setting of the infinite_rewrite named argument. If the infinite_rewrite named argument is true, the rewrite strategy is used. If the infinite_rewrite named argument is false, the dynamic strategy is used. The rewrite strategy is the default.
The Rewrite Strategy
In the rewrite strategy, the parse bocage is rewritten during the evaluator setup phase, in the Marpa::Evaluator::new
method. Potentially infinite cycles are eliminated in the rewrite, but all derivation paths for cycles of length one are preserved. This is the most powerful method and it works extremely well when the parse only contains a limited number of infinite cycles.
A disadvantage of the rewrite strategy is that the rewrite is done, and its full cost in CPU time incurred, before any parses are returned. Marpa typically tries to amortize the cost of ambiguity over the parse results.
Another disadvantage of the rewrite strategy is that there are grammars for which the rewrite is so CPU intensive that it is simply impractical for a parse of any size. These cases are probably only theoretical, but they are not very hard to create.
The extreme worst cases are plex grammars. In a plex grammar, every symbol directly produces every other symbol. That is, in a plex grammar, for every ordered pair of symbols X and Y, there is a rule of the form
X ::= Y
I cannot think that anyone will be using plex grammars in a practical application. On the other hand, the same thing might be said of infinite ambiguity itself.
The Dynamic Strategy
The rewriting of infinite cycles can be turned off by setting the Multi-parse Evaluator's infinite_rewrite
named argument to false. This totally eliminates the cost of the infinite cycle rewrite. The disadvantage of the dynamic strategy is that, in Marpa's current implementation, it skips some parse results which the rewrite strategy includes.
The rewrite strategy includes all the parse results that are possible when cycle length is limited to 1. Where there are multiple, connected cycles, the rewrite strategy will iterate every possible way of reaching every possible cycle.
When multiple cycles interconnect, many parse results will simply be different paths through the tangle of cycles. The dynamic strategy prunes some of these paths. If an application does not need a precise enumeration of all possible ways of reaching every cycle at every location, this pruning may be acceptable or even preferred.
LICENSE AND COPYRIGHT
Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 211:
L<> starts or ends with whitespace