NAME

Marpa::XS::Recognizer - Marpa Recognizer Objects

SYNOPSIS

my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
$recce->read( 'Number', 42 );
$recce->read( 'Multiply', );
$recce->read( 'Number', 1 );
$recce->read( 'Add', );
$recce->read( 'Number', 7 );

DESCRIPTION

The Marpa::XS::Recognizer::new constructor creates a recognizer object from a precomputed Marpa grammar object. The Marpa::XS::Recognizer::read call reads input.

In Marpa, recognition is the phase that reads input. Evaluation is the phase that produces the parse results. Marpa::XS::Recognizer objects handle both recognition and evaluation.

Location

By default, Marpa treats its input as a token stream. Each Marpa::XS::Recognizer::read adds a token at the current location, then advances the current location by one. For the first Marpa::XS::Recognizer::read call, the current location is 0. Where confusion is possible, the current location is also called current parse location. Marpa internally tracks location in terms of earlemes. The current earleme is always exactly the same as the current location.

In almost all compilers, parser generators, and textbooks, the input model described is the token-stream method. In this document, it will always be assumed that the default, token stream, model of input is the one being used. Marpa supports other input models. and these are described in the document on alternative input models.

CONSTRUCTOR

new

The new method's arguments are references to hashes of named arguments. In each key/value pair of these hashes, the key is the argument name, and the hash value is the value of the argument. The new method either returns a new recognizer object or throws an exception. The named arguments are described in a section below.

ACCESSORS

current_earleme

Returns the current location. Rarely used, since the current location is easily tracked by incrementing a counter after every Marpa::XS::Recognizer::read call.

expected_terminals

Returns a reference to a list of strings. The strings will be the names of the terminals which are acceptable at the current location. The expected_terminals call is the only way to find out which terminals are acceptable at location 0. Since this is usually not necessary, and since, for current locatons after location 0, the list of of expected terminals is returned by the Marpa::XS::Recognizer::read call, expected_terminals rarely needs to be used.

MUTATORS

set

The set method's arguments are references to hashes of named arguments. The set method can be used to set or change named arguments after the recognizer has been created. Details of the named arguments are below.

strip

In XS mode, this call is a no-op. In Pure Perl mode, when called after input is finished, this call cleans up internal data not needed for the evaluation phase.

In Pure Perl mode, A considerable amount of data is only used while building the recognizer's tables, and is not needed once they are complete. Stripping a recognizer greatly reduces the amount of memory it uses. Attempting to strip a recognizer before input is finished will cause an exception.

read

$recce->read( 'Number', 42 );
$recce->read( 'Multiply', );
$recce->read( 'Number', 1 );
$recce->read( 'Add', );
$recce->read( 'Number', 7 );

The read method takes two arguments: a token name and a token value. The token name is required. It must be the name of a valid terminal symbol. The token value is optional. It defaults to a Perl undef when not specified. For details about terminal symbols, see "Terminals" in Marpa::XS::Grammar.

If the parser accepted the token, the read method returns a reference to an array of strings which lists the terminals that will be accepted by the next call to read. The strings are the terminal names. Applications can use this list to change the input "on the fly".

If the parser rejected the token, read throws an exception, unless the recognizer is in interactive mode. If the parser rejected the token in interactive mode, read returns a Perl undef.

The read method implements the "token stream" interface. This is the standard model for input to compilers and parsers. The token stream model is all that most users will be interested in, at least at first. Marpa, however, allows non-traditional models of the input. The adventurous will find these described in a separate document.

The list of expected tokens returned by the read method can be used for "Ruby Slippers" parsing. In Ruby Slippers parsing, when the parser does not accept its input, the application can change it on the fly. This is called Ruby Slippers parsing, because all the parser has to do is wish, and whatever it wishes for happens. For detail on how to use the list of expected tokens, see the section on interactive input.

It is possible for the list of expected tokens to be empty. When this happens in the default input model, the recognizer is said to be exhausted.

value

The value mutator evaluates and returns a parse result. It is described in its own section.

ACCESSOR

check_terminal

Returns a Perl true when its argument is the name of a terminal symbol. Otherwise, returns a Perl false. Not often needed, but in special sitations a lexer may find this the most convenient way to determine if a symbol is a terminal.

TRACE ACCESSORS

show_earley_sets

print $recce->show_earley_sets()
    or die "print failed: $ERRNO";

Returns a multi-line string listing every Earley item in every Earley set. show_earley_sets requires knowledge of Marpa internals to interpret.

For debugging grammars, users will want to use show_progress instead. show_progress contains the information necessary for debugging grammars and interpreting parse progress.

show_progress

print $recce->show_progress()
    or die "print failed: $ERRNO";

Returns a string describing the progress of the parse. With no arguments, the string contains reports for the current location. With a non-negative argument N, the string contains reports for location N.

With two numeric arguments, N and M, the arguments are interpreted as a range of location and the returned string contains reports for all locations from N to M. The first argument must be non-negative. If the second argument is a negative integer, "-M", it indicates the Mth location from the last. In other words, -1 is the last location, -2 the next to last, etc. The call $recce->show_progress(0, -1) will print progress reports for the entire parse.

show_progress is an important tool for debugging application grammars. It can also be used to track the progress of a parse or to investigate how a parse works. A much fuller description, with an example, is in the document on debugging Marpa grammars.

NAMED ARGUMENTS

grammar

The grammar named argument is required. Its value must be a precomputed Marpa grammar object.

interactive

A boolean. Unset by default. When set, the Marpa::XS::Recognizer::read call return a Perl undef when a token is rejected. The default is for Marpa::XS::Recognizer::read to throw an exception when the token name provided is not one that the parser will accept.

ranking_method

The value must be a string: either "none" or "constant". When the value is "none", Marpa returns the parse results in arbitrary order. When the value is "constant", Marpa allows the user to control the order in which parse results are returned by specifying ranking actions which assign values to rules and tokens.

The default is for parse results to be returned in arbitrary order. For details, see the section on parse order in the semantics document.

too_many_earley_items

The too_many_earley_items argument is optional. If specified, it sets the Earley item warning threshold. If an Earley set becomes larger than the Earley item warning threshold, a warning is printed to the trace file handle.

Marpa parses from any BNF, and can handle grammars and inputs which produce large Earley sets. But parsing that involves large Earley sets can be slow. Large Earley sets are something most applications can, and will wish to, avoid.

By default, Marpa calculates an Earley item warning threshold based on the size of the grammar. The default threshold will never be less than 100. If the Earley item warning threshold is set to 0, warnings about large Earley sets are turned off. For details about Earley sets, see the implementation document.

trace_earley_sets

A boolean. If true, causes each Earley set to be written to the trace file handle as it is completed. For details about Earley sets, see the implementation document.

trace_file_handle

The value is a file handle. Traces and warning messages go to the trace file handle. By default the trace file handle is inherited from the grammar used to create the recognizer.

trace_terminals

Very handy in debugging, and often useful even when the problem is not in the lexing. The value is a trace level. When the trace level is 0, tracing of terminals is off. This is the default.

At a trace level of 1 or higher, Marpa traces terminals as they are accepted or rejected by the recognizer. At a trace level of 2 or higher, Marpa traces the terminals expected at every location. Practical grammars often expect a large number of different terminals at many locations, so the output from a trace level of 2 can be voluminous.

warnings

The value is a boolean. Warnings are written to the trace file handle. By default, the recognizer's warnings are on. Usually, an application will want to leave them on.

RUBY SLIPPERS PARSING

$recce =
    Marpa::XS::Recognizer->new( { grammar => $grammar, interactive => 1 } );

my @tokens = (
    [ 'Number', 42 ],
    ['Multiply'], [ 'Number', 1 ],
    ['Add'],      [ 'Number', 7 ],
);

TOKEN: for ( my $token_ix = 0; $token_ix <= $#tokens; $token_ix++ ) {
    defined $recce->read( @{ $tokens[$token_ix] } )
        or fix_things( $recce, \@tokens )
        or die q{Don't know how to fix things};
}

Marpa tells the application which symbols are acceptable as tokens at the next location in the parse. This can be very useful. The application can use this information to change the input so that it is acceptable to the parser.

An application does not have to anticipate problems. If a Marpa::XS::Recognizer::read call fails, Marpa can simply retry it, changing the input.

By default, when a token is rejected, Marpa throws an exception. But if an application is doing Ruby Slippers parsing, it may be more convenient to set the recognizer's interactive option. When the interactive option is set, if a token is rejected, the Marpa::XS::Recognizer::read call returns a Perl undef. The list of acceptable tokens will be that returned by the previous Marpa::XS::Recognizer::read call. The list of acceptable tokens may be explicitly requested with the Marpa::XS::Recognizer::expected_terminals call.

An Example

Marpa's HTML parser, Marpa::HTML, is an example of how Ruby Slippers parsing can help with a non-trivial, real-life application. When Marpa::HTML rejects a token, it tries to fix things using two techniques. In the first technique, Marpa::HTML sometimes changes the token next in the input stream to match the parser's expectations.

For HTML start and end tags, Marpa::HTML uses the second technique: "virtual" tokens. A major complexity of liberal HTML parsing is dealing with omitted start and end tags. Marpa::HTML handles with these by parsing with a grammar that takes a simple view of the world -- it assumes, contrary to fact, that start and end tags are always present. Ruby Slippers parsing is then used to make the grammar's simplistic view of the world come true.

When a token is rejected, Marpa::HTML looks at the expected tokens list. If it sees that a start or end tag is wanted, Marpa::HTML creates a "virtual" tag. Marpa::HTML then resumes input where it left off. This very simple solution to a difficult problem is made possible by the Ruby Slippers feature of the Marpa parse engine.

EVALUATION

my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';

The value method call evaluates and returns a parse result. Its arguments are zero or more hashes of named arguments. It returns a reference to the value of the next parse result, or undef if there are no more parse results.

These are the named arguments available to the value method call:

end

The value method's end named argument specifies the parse end location. The default is for the parse to end where the input did, so that the parse returned is of the entire input.

closures

The value method's closures named argument is a reference to a hash. In each key/value pair of this hash, the key must be an action name. The hash value must be a CODE ref.

Sources of action names include

  • The action properties of rules

  • The default_action named argument of grammars

  • The lhs properties of rules

  • The ranking_action properties of rules

  • For its new method, the action_object named argument of grammars

When an action name is a key in the closures named argument, the usual action resolution mechanism of the semantics is bypassed. A common use of the closures named argument is to allow anonymous subroutines to be semantic actions. For more details, see the document on semantics.

max_parses

The value must be an integer. If it is greater than zero, the evaluator will return no more than that number of parse results. If it is zero, there will be no limit on the number of parse results returned. The default is for there to be no limit.

Marpa allows extremely ambiguous grammars. max_parses can be used if the user only wants to see the first few parse results of an ambiguous parse. max_parses is also useful to limit CPU usage and output length when testing and debugging.

trace_actions

The value method's trace_actions named argument is a boolean. If the boolean value is true, Marpa traces the resolution of action names to Perl closures. A boolean value of false turns tracing off, which is the default. Traces are written to the trace file handle.

trace_values

The value method's trace_values named argument is a numeric trace level. If the numeric trace level is 1, Marpa traces values as they are computed in the evaluation stack. A trace level of 0 turns value tracing off, which is the default. Traces are written to the trace file handle.

COPYRIGHT AND LICENSE

Copyright 2010 Jeffrey Kegler
This file is part of Marpa::XS.  Marpa::XS is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.

Marpa::XS is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser
General Public License along with Marpa::XS.  If not, see
http://www.gnu.org/licenses/.