NAME
Parse::Marpa::Recognizer - Marpa Recognizer Objects
SYNOPSIS
my $recce = new Parse::Marpa::Recognizer({
grammar => $grammar,
});
my $fail_offset = $recce->text(\('2-0*3+1'));
if ($fail_offset >= 0) {
die("Parse failed at offset $fail_offset");
}
my $recce2 = Parse::Marpa::Recognizer::new({grammar => $grammar});
my $op = $grammar->get_symbol('op');
my $number = $grammar->get_symbol('number');
$recce2->earleme([$number, 2, 1]);
$recce2->earleme([$op, '-', 1]);
$recce2->earleme([$number, 0, 1]);
$recce2->earleme([$op, '*', 1]);
$recce2->earleme([$number, 3, 1]);
$recce2->earleme([$op, '+', 1]);
$recce2->earleme([$number, 1, 1]);
$recce2->end_input();
DESCRIPTION
Marpa parsing takes place in three major phases: grammar creation, input recognition and parse evaluation. Once a grammar has rules, a recognizer can be created from it. The recognizer accepts input and can be used to create a Marpa evaluator object.
Tokens and Earlemes
Marpa allows ambiguous tokens. Several Marpa tokens can start at a single parsing location. Marpa tokens can be of various lengths. Marpa tokens can even overlap.
For most parsers, position is location in a token stream. To deal with variable-length and overlapping tokens, Marpa needs a more flexible idea of location. This flexibility is provided by tracking parse position in earlemes. Earlemes are named after Jay Earley, the inventor of the first algorithm in Marpa's lineage.
If you do your lexing with the text
method, you will use a one-character-per-earleme model. text
's raw input is a Perl 5 string, and each earleme is a character location in that string.
Marpa is not restricted to the one-character-per-earleme model. With the earleme
method, you can structure your input in almost any way you like. You can, for example, create a token stream and use a one-token-per-earleme model, and this would be equivalent to the standard way of doing things. You can also structure your input in other, special ways to suit your application.
There are three restrictions on mapping tokens to earlemes:
Scanning always starts at earleme 0.
Tokens must be scanned in earleme order. That is, all the tokens at earleme
N
must be scanned before any token at earlemeN+1
.Tokens cannot be zero or negative in earleme length.
"Earleme N" means the location N earlemes after earleme 0. Length in earlemes probably means what you expect it does. The length from earleme 3 to earleme 6, for instance, is 3 earlemes.
When a token is scanned, the start of the token is put at the current earleme. Where the token ends depends on its length, which must be greater than zero. The default end of parsing is tracked by each recognizer. If the user does not explicitly specify where an evaluator should end its parse, the evaluator uses the default end of parsing that it inherited from the recognizer.
Parse Exhaustion
In recognizing input, a point may come where it is clear that a successful parse is no longer possible. At this point, both the parse and the recognizer are said to be exhausted. A parse or a recognizer is active, if and only if it is not exhausted.
Because tokens can span earlemes, parses in Marpa can remain active even if no token either ends or begins at the current earleme. In fact, Marpa parses often contain long stretches of earlemes with no token boundaries.
METHODS
new
my $recce = new Parse::Marpa::Recognizer({
grammar=> $g,
lex_preamble => $new_lex_preamble,
});
The new
method's one, required, argument is a hash reference of named arguments. The new
method either returns a new parse object or throws an exception. Either the compiled_grammar
or the grammar
named argument must be specified, but not both. A recognizer is created with the current earleme and the default end of parsing both set at earleme 0.
If the grammar
option is specified, its value must be a grammar object with rules defined. If it is not precomputed, new
will precompute it. A deep copy of the grammar is then made to be used in the recognizer.
If the compiled_grammar
option is specified, its value must be a Perl 5 string containing a compiled Marpa grammar, as produced by Parse::Marpa::Grammar::compile
. It will be decompiled for use in the recognizer.
Marpa options can also be named arguments to new
. For these, see "OPTIONS" in Parse::Marpa.
text
local($RS) = undef;
my $spec = <FH>;
my $fail_offset = $recce->text(\$spec);
if ($fail_offset >= 0) {
die("Parse failed at offset $fail_offset");
}
Extends the parse using the one-character-per-earleme model. The one, required, argument must be a reference to a string containing text to be parsed. If the parse is active after the text has been processed, the default end of parsing is set to the end of the text, the current earleme is set to the earleme just after the end of text, and -1 is returned.
If the parse is exhausted by the input, the default end of parsing remains at the last earleme at which the parse was active, and the character offset at which the parse was exhausted is returned. A zero return means that the parse was exhausted at character offset zero. Failures, other than exhausted parses, are thrown as exceptions.
When you use the text
method for input, earlemes correspond one-to-one to characters in the text. The earleme number is always one more than the character offset from the start of text. The first character is at earleme one and offset zero.
Terminals are recognized in the text using the lexers that were specified in the porcelain or the plumbing. The earleme length of each token is set to the length of the token in characters. (If a token has a "lex prefix", the length of the lex prefix counts as part of the token length.)
earleme
my $op = $grammar->get_symbol("op");
$recce2->earleme([$op, "-", 1]);
The earleme
method adds tokens at the current earleme. Every call to the earleme
method moves the current earleme forward by one earleme. Unlike text
, the earleme
method assumes no particular model of the input.
The earleme
method takes zero or more arguments. Each argument is a token which starts at the current earleme. More than one token may be added at an each earleme, because ambiguous lexing is allowed. Each token argument is a reference to a three element array. The first element is a "cookie" for the token's symbol, as returned by the Parse::Marpa::Grammar::get_symbol
method or the get_symbol
method of a porcelain interface. The second element is the token's value in the parse, and may be any value legal in Perl 5, including undefined. The third is the token's length in earlemes.
The earleme
method first adds the tokens in the arguments, if there were any. If, after all tokens have been added, the parse is still active, the default end of parsing is set to the current earleme. The current earleme is then advanced by one and the earleme
method returns 1, indicating that the parse is still active.
The earleme
method may be called without any arguments, and if tokens span multiple earlemes, as is often the case when the text
method is being used, the parse might well remain active after such a call. Whether or not any tokens were added in a call to the earleme
method, if the parse remains active, both the current earleme and the default end of parsing are incremented by one.
If the earleme method results in an exhausted parse, it returns 0. The default end of parsing remains at the last earleme at which the parse was active. The earleme
method throws an exception on other failures.
An earleme remains the current earleme during only one call of the earleme
method. All tokens starting at that earleme must be added in that call. The first time that the earleme
method is called in a recognizer, the current earleme is at earleme 0.
This is the low-level token input method, and allows maximum control over scanning. No model of the input, or of the relationship between the tokens and the earlemes, is assumed. The user is free to invent her own.
end_input
$recce2->end_input();
This method takes no arguments. It is used with the earleme
method in offline mode, to indicate the end of input. The input is processed out to the last earleme at which a token ends, and the default end of parsing is set to that earleme. The current earleme is then set to the earleme after the default end of parsing.
SUPPORT
See the support section in the main module.
AUTHOR
Jeffrey Kegler
LICENSE AND COPYRIGHT
Copyright 2007 - 2008 Jeffrey Kegler
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.