NAME

Parse::Marpa::Recognizer - A Marpa Recognizer Object

SYNOPSIS

    my $recce = new Parse::Marpa::Recognizer({
	grammar => $grammar,
    });

    my $fail_offset = $recce->text(\("2-0*3+1"));
    if ($fail_offset >= 0) {
       die("Parse failed at offset $fail_offset");
    }

my $recce2 = Parse::Marpa::Recognizer::new({grammar => $grammar});

my $op = $grammar->get_symbol("op");
my $number = $grammar->get_symbol("number");
$recce2->earleme([$number, 2, 1]);
$recce2->earleme([$op, "-", 1]);
$recce2->earleme([$number, 0, 1]);
$recce2->earleme([$op, "*", 1]);
$recce2->earleme([$number, 3, 1]);
$recce2->earleme([$op, "+", 1]);
$recce2->earleme([$number, 1, 1]);
$recce2->end_input();

DESCRIPTION

Marpa parsing takes place in three major phases: grammar creation, input recognition and parse evaluation. Once a grammar has rules and is precomputed, a recognizer can be created from it. The recognizer accepts input and can be used to create a Marpa evaluator object.

Tokens and Earlemes

Marpa allows ambiguous tokens. Several Marpa tokens can start at a single parsing location. Marpa tokens can be of various lengths. Marpa tokens can even overlap.

For most parsers, their idea of position is a location in a token stream. To deal with variable-length and overlapping tokens, Marpa needs a more flexible idea of location. This flexibility is provided by tracking parse position in earlemes, which are named in honor of Jay Earley, the inventor of the algorithm on which Marpa is based.

If you do your lexing with the text method, you will use a one-character-per-earleme model. That is, input will be treated as a string, and, each earleme will be a character location in that string. If you set up your terminals using MDL, Marpa assumes that you will be using the text method and one-character-per-earleme matching.

Marpa is not restricted to the one-character-per-earleme model. With the earleme method, you can structure your input in almost any way you like. You could, for example, create a token stream and use a one-token-per-earleme model, and this would be equivalent to the standard way of doing things. You can also structure your input in other, special ways to suit your application.

There are only two restrictions in mapping tokens to earlemes:

  1. Tokens must be scanned in earleme order. That is, all the tokens at earleme N must be recognized before any token at earleme N+1.

  2. Tokens cannot be zero or negative in earleme length.

A parse is said to start at earleme 0, and "earleme N" means the location N earlemes after earleme 0. Length in earlemes probably means what you expect it does. The length from earleme 3 to earleme 6, for instance, is 3 earlemes.

The tokens text recognizes are fed to the Marpa parse engine. The earleme length of each token is set using the token's earleme length. (If a token has a "lex prefix", the length of the lex prefix counts as part of the token length.)

Parse Exhaustion

In conventional Earley parsing, a parse is exhausted as soon as the parser reaches a "location" without a token. Because Marpa parses in terms of earlemes and tokens can span many earlemes, parses in Marpa remain active even if they reach an "empty earleme". In fact, Marpa parses often contain many stretches of empty earlemes, and some of these stretches can be quite long.

In Marpa, a parse remains active if some token has been recognized which ends at or after the current earleme. A Marpa parse is not exhausted until

  • No token starts at the current earleme, and

  • No token ends at or after the current earleme.

Note to Experts

Those of you already familiar with Earley parsing and its standard terminology may find the following helpful:

  • Each "earleme" correspond to an Earley set in the usual terminology.

  • In the usual terminology, an "empty earleme" would be an Earley set with no Earley items.

METHODS

new

my $recce = new Parse::Marpa::Recognizer({
   grammar=> $g,
   preamble => $new_preamble,
});

Parse::Marpa::Recognizer::new takes as its arguments a hash reference containing named arguments. It returns a new parse object or throws an exception. Either the compiled_grammar or the grammar option must be specified, but not both. A recognizer is created with the default end of parsing set to earleme 0, which is before any input.

If the grammar option is specified, its value must be a grammar object with rules defined. If it is not precomputed, new will precompute it. A deep copy of the grammar is then made to be used in the recognizer.

If the compiled_grammar option is specified, its value must be a Perl 5 string containing a compiled Marpa grammar, as produced by Parse::Marpa::Grammar::compile. It will be decompiled for use in the recognizer.

Marpa options are also valid named arguments. For these, see "OPTIONS" in Parse::Marpa.

text

local($RS) = undef;
my $spec = <FH>;
my $fail_offset = $recce->text(\$spec);
if ($fail_offset >= 0) {
   die("Parse failed at offset $fail_offset");
}

Extends the parse in the one-character-per-earleme model. The one, required, argument must be a reference to a string containing the text to be parsed. If the parse is active after the text has been processed, the default end of parsing is set to the end of the text, and -1 is returned.

If the parse is exhausted by the input, that is, if processing reaches a point where no successful parse is possible, the default end of parsing is set to the earleme at which the parse was exhausted, and the character offset at which the parse was exhausted is returned. A zero return means that the parse was exhausted at character offset zero. Failures, other than exhausted parses, are thrown as exceptions.

When you use the text method for input, earlemes correspond one-to-one to characters in the text. The earleme number is always one more than the character offset from the start of text. The first character is at earleme one and offset zero. Terminals are recognized in the text using the lexers that were specified in the porcelain or the plumbing.

earleme

my $op = $grammar->get_symbol("op");
$recce2->earleme([$op, "-", 1]);

The earleme method takes zero or more arguments. Each argument is a token which starts at the current earleme. Every call to the earleme method moves the current earleme forward by one earleme.

More than one token may be added at an each earleme, because ambiguous lexing is allowed. Each token is a reference to a three element array. The first element is a "cookie" for the token's symbol, as returned by the Parse::Marpa::get_symbol method, or the get_symbol method of a porcelain interface. The second element is the token's value in the parse. The third is the token's length in earlemes.

The earleme method first checks to see if the parse is still active, that is, if it is still possible for the parse to succeed. If the parse is active, the tokens are added. The default end of parsing is set to the current earleme, after which the current earleme is advanced by one. If the earleme method is called without any arguments, both the current earleme and the default end of parsing will be incremented one earleme, but no new tokens are added.

An earleme remains the current earleme during only one call of the earleme method. All tokens starting at that earleme must be added in that call. The first time that the earleme method is called in a recognizer, the current earleme is at earleme 0.

If no parses are possible, the parse is said to be exhausted. If the earleme method is called on an exhausted parse, it returns 0. The default end of parse remains where it was, at the last earleme at which the parse was active. The earleme method throws an exception on other failures.

This is the low-level token input method, and allows maximum control over the form and interrelationship of tokens. No model of the relationship between the tokens and the earlemes is assumed. The user is free to invent her own.

end_input

$recce2->end_input();

This method takes no arguments. It is used with the earleme method in offline mode, to signal the end of input. The input is processed out to the last earleme at which a token ends, and the default end of parsing is set to that earleme.

SUPPORT

See the support section in the main module.

AUTHOR

Jeffrey Kegler

COPYRIGHT

Copyright 2007 - 2008 Jeffrey Kegler

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.