NAME
Marpa::XS::Recognizer - Marpa Recognizer Objects
SYNOPSIS
my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
$recce->read( 'Number', 42 );
$recce->read( 'Multiply', );
$recce->read( 'Number', 1 );
$recce->read( 'Add', );
$recce->read( 'Number', 7 );
DESCRIPTION
The Marpa::XS::Recognizer::new
constructor creates a recognizer object from a precomputed Marpa grammar object. The Marpa::XS::Recognizer::read
call reads input.
In Marpa, recognition is the phase that reads input. Evaluation is the phase that produces the parse results. Marpa::XS::Recognizer objects handle both recognition and evaluation.
Location
By default, Marpa treats its input as a token stream. Each Marpa::XS::Recognizer::read
adds a token at the current location, then advances the current location by one. For the first Marpa::XS::Recognizer::read
call, the current location is 0. Where confusion is possible, the current location is also called current parse location. Marpa internally tracks location in terms of earlemes. The current earleme is always exactly the same as the current location.
In almost all compilers, parser generators, and textbooks, the input model described is the token-stream method. In this document, it will always be assumed that the default, token stream, model of input is the one being used. Marpa supports other input models. and these are described in the document on alternative input models.
CONSTRUCTOR
new
The new
method's arguments are references to hashes of named arguments. In each key/value pair of these hashes, the key is the argument name, and the hash value is the value of the argument. The new
method either returns a new recognizer object or throws an exception. The named arguments are described in a section below.
ACCESSORS
current_earleme
Returns the current location. Rarely used, since the current location is easily tracked by incrementing a counter after every Marpa::XS::Recognizer::read
call.
expected_terminals
Returns a reference to a list of strings. The strings will be the names of the terminals which are acceptable at the current location. The expected_terminals
call is the only way to find out which terminals are acceptable at location 0. Since this is usually not necessary, and since, for current locatons after location 0, the list of of expected terminals is returned by the Marpa::XS::Recognizer::read
call, expected_terminals
rarely needs to be used.
MUTATORS
set
The set
method's arguments are references to hashes of named arguments. The set
method can be used to set or change named arguments after the recognizer has been created. Details of the named arguments are below.
strip
In XS mode, this call is a no-op. In Pure Perl mode, when called after input is finished, this call cleans up internal data not needed for the evaluation phase.
In Pure Perl mode, A considerable amount of data is only used while building the recognizer's tables, and is not needed once they are complete. Stripping a recognizer greatly reduces the amount of memory it uses. Attempting to strip a recognizer before input is finished will cause an exception.
read
$recce->read( 'Number', 42 );
$recce->read( 'Multiply', );
$recce->read( 'Number', 1 );
$recce->read( 'Add', );
$recce->read( 'Number', 7 );
The read
method takes two arguments: a token name and a token value. The token name is required. It must be the name of a valid terminal symbol. The token value is optional. It defaults to a Perl undef
when not specified. For details about terminal symbols, see "Terminals" in Marpa::XS::Grammar.
If the parser accepted the token, the read
method returns a reference to an array of strings which lists the terminals that will be accepted by the next call to read
. The strings are the terminal names. Applications can use this list to change the input "on the fly".
If the parser rejected the token, read
throws an exception, unless the recognizer is in interactive mode. If the parser rejected the token in interactive mode, read
returns a Perl undef
.
The read
method implements the "token stream" interface. This is the standard model for input to compilers and parsers. The token stream model is all that most users will be interested in, at least at first. Marpa, however, allows non-traditional models of the input. The adventurous will find these described in a separate document.
The list of expected tokens returned by the read
method can be used for "Ruby Slippers" parsing. In Ruby Slippers parsing, when the parser does not accept its input, the application can change it on the fly. This is called Ruby Slippers parsing, because all the parser has to do is wish, and whatever it wishes for happens. For detail on how to use the list of expected tokens, see the section on interactive input.
It is possible for the list of expected tokens to be empty. When this happens in the default input model, the recognizer is said to be exhausted.
value
The value
mutator evaluates and returns a parse result. It is described in its own section.
ACCESSOR
check_terminal
Returns a Perl true when its argument is the name of a terminal symbol. Otherwise, returns a Perl false. Not often needed, but in special sitations a lexer may find this the most convenient way to determine if a symbol is a terminal.
TRACE ACCESSORS
show_earley_sets
print $recce->show_earley_sets()
or die "print failed: $ERRNO";
Returns a multi-line string listing every Earley item in every Earley set. show_earley_sets
requires knowledge of Marpa internals to interpret.
For debugging grammars, users will want to use show_progress
instead. show_progress
contains the information necessary for debugging grammars and interpreting parse progress.
show_progress
print $recce->show_progress()
or die "print failed: $ERRNO";
Returns a string describing the progress of the parse. With no arguments, the string contains reports for the current location. With a non-negative argument N, the string contains reports for location N.
With two numeric arguments, N and M, the arguments are interpreted as a range of location and the returned string contains reports for all locations from N to M. The first argument must be non-negative. If the second argument is a negative integer, "-M", it indicates the Mth location from the last. In other words, -1 is the last location, -2 the next to last, etc. The call $recce->show_progress(0, -1)
will print progress reports for the entire parse.
show_progress
is an important tool for debugging application grammars. It can also be used to track the progress of a parse or to investigate how a parse works. A much fuller description, with an example, is in the document on debugging Marpa grammars.
NAMED ARGUMENTS
grammar
The grammar
named argument is required. Its value must be a precomputed Marpa grammar object.
interactive
A boolean. Unset by default. When set, the Marpa::XS::Recognizer::read
call return a Perl undef
when a token is rejected. The default is for Marpa::XS::Recognizer::read
to throw an exception when the token name provided is not one that the parser will accept.
ranking_method
The value must be a string: either "none
" or "constant
". When the value is "none
", Marpa returns the parse results in arbitrary order. When the value is "constant
", Marpa allows the user to control the order in which parse results are returned by specifying ranking actions which assign values to rules and tokens.
The default is for parse results to be returned in arbitrary order. For details, see the section on parse order in the semantics document.
too_many_earley_items
The too_many_earley_items
argument is optional. If specified, it sets the Earley item warning threshold. If an Earley set becomes larger than the Earley item warning threshold, a warning is printed to the trace file handle.
Marpa parses from any BNF, and can handle grammars and inputs which produce large Earley sets. But parsing that involves large Earley sets can be slow. Large Earley sets are something most applications can, and will wish to, avoid.
By default, Marpa calculates an Earley item warning threshold based on the size of the grammar. The default threshold will never be less than 100. If the Earley item warning threshold is set to 0, warnings about large Earley sets are turned off. For details about Earley sets, see the implementation document.
trace_earley_sets
A boolean. If true, causes each Earley set to be written to the trace file handle as it is completed. For details about Earley sets, see the implementation document.
trace_file_handle
The value is a file handle. Traces and warning messages go to the trace file handle. By default the trace file handle is inherited from the grammar used to create the recognizer.
trace_terminals
Very handy in debugging, and often useful even when the problem is not in the lexing. The value is a trace level. When the trace level is 0, tracing of terminals is off. This is the default.
At a trace level of 1 or higher, Marpa traces terminals as they are accepted or rejected by the recognizer. At a trace level of 2 or higher, Marpa traces the terminals expected at every location. Practical grammars often expect a large number of different terminals at many locations, so the output from a trace level of 2 can be voluminous.
warnings
The value is a boolean. Warnings are written to the trace file handle. By default, the recognizer's warnings are on. Usually, an application will want to leave them on.
RUBY SLIPPERS PARSING
$recce =
Marpa::XS::Recognizer->new( { grammar => $grammar, interactive => 1 } );
my @tokens = (
[ 'Number', 42 ],
['Multiply'], [ 'Number', 1 ],
['Add'], [ 'Number', 7 ],
);
TOKEN: for ( my $token_ix = 0; $token_ix <= $#tokens; $token_ix++ ) {
defined $recce->read( @{ $tokens[$token_ix] } )
or fix_things( $recce, \@tokens )
or die q{Don't know how to fix things};
}
Marpa tells the application which symbols are acceptable as tokens at the next location in the parse. This can be very useful. The application can use this information to change the input so that it is acceptable to the parser.
An application does not have to anticipate problems. If a Marpa::XS::Recognizer::read
call fails, Marpa can simply retry it, changing the input.
By default, when a token is rejected, Marpa throws an exception. But if an application is doing Ruby Slippers parsing, it may be more convenient to set the recognizer's interactive
option. When the interactive
option is set, if a token is rejected, the Marpa::XS::Recognizer::read
call returns a Perl undef
. The list of acceptable tokens will be that returned by the previous Marpa::XS::Recognizer::read
call. The list of acceptable tokens may be explicitly requested with the Marpa::XS::Recognizer::expected_terminals
call.
An Example
Marpa's HTML parser, Marpa::HTML, is an example of how Ruby Slippers parsing can help with a non-trivial, real-life application. When Marpa::HTML rejects a token, it tries to fix things using two techniques. In the first technique, Marpa::HTML sometimes changes the token next in the input stream to match the parser's expectations.
For HTML start and end tags, Marpa::HTML uses the second technique: "virtual" tokens. A major complexity of liberal HTML parsing is dealing with omitted start and end tags. Marpa::HTML handles with these by parsing with a grammar that takes a simple view of the world -- it assumes, contrary to fact, that start and end tags are always present. Ruby Slippers parsing is then used to make the grammar's simplistic view of the world come true.
When a token is rejected, Marpa::HTML looks at the expected tokens list. If it sees that a start or end tag is wanted, Marpa::HTML creates a "virtual" tag. Marpa::HTML then resumes input where it left off. This very simple solution to a difficult problem is made possible by the Ruby Slippers feature of the Marpa parse engine.
EVALUATION
my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';
The value
method call evaluates and returns a parse result. Its arguments are zero or more hashes of named arguments. It returns a reference to the value of the next parse result, or undef if there are no more parse results.
These are the named arguments available to the value
method call:
end
The value
method's end
named argument specifies the parse end location. The default is for the parse to end where the input did, so that the parse returned is of the entire input.
closures
The value
method's closures
named argument is a reference to a hash. In each key/value pair of this hash, the key must be an action name. The hash value must be a CODE ref.
Sources of action names include
The
action
properties of rulesThe
default_action
named argument of grammarsThe
lhs
properties of rulesThe
ranking_action
properties of rulesFor its
new
method, theaction_object
named argument of grammars
When an action name is a key in the closures
named argument, the usual action resolution mechanism of the semantics is bypassed. A common use of the closures
named argument is to allow anonymous subroutines to be semantic actions. For more details, see the document on semantics.
max_parses
The value must be an integer. If it is greater than zero, the evaluator will return no more than that number of parse results. If it is zero, there will be no limit on the number of parse results returned. The default is for there to be no limit.
Marpa allows extremely ambiguous grammars. max_parses
can be used if the user only wants to see the first few parse results of an ambiguous parse. max_parses
is also useful to limit CPU usage and output length when testing and debugging.
trace_actions
The value
method's trace_actions
named argument is a boolean. If the boolean value is true, Marpa traces the resolution of action names to Perl closures. A boolean value of false turns tracing off, which is the default. Traces are written to the trace file handle.
trace_values
The value
method's trace_values
named argument is a numeric trace level. If the numeric trace level is 1, Marpa traces values as they are computed in the evaluation stack. A trace level of 0 turns value tracing off, which is the default. Traces are written to the trace file handle.
COPYRIGHT AND LICENSE
Copyright 2010 Jeffrey Kegler
This file is part of Marpa::XS. Marpa::XS is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
Marpa::XS is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser
General Public License along with Marpa::XS. If not, see
http://www.gnu.org/licenses/.