NAME

Marpa::Advanced::Models - Other Input Models (Advanced)

ABOUT THIS DOCUMENT

The alternative input models desribed in this document are a very advanced technique. This document may safely be ignored by ordinary Marpa users.

DESCRIPTION

Token Streams

Marpa's default input model is the traditional one -- a token stream. Token streams are very standard in parsing applications -- so much so that most texts do not take the trouble of defining the term. A token stream is input structured as a sequence of tokens, where each token occupies one location and every location has a token. In the token stream model, all tokens are of the same length.

Conventionally, all tokens are of length 1, and the token stream starts at location 0. Following this convention, the Nth token would start at location N-1 and end at location N. For example, the first token would start at location 0 and end at location 1. The second token would start at location 1 and end at location 2.

Earlemes

The basic idea of location in Marpa is the earleme. Earlemes are named after Jay Earley. Internally, each earleme corresponds exactly and directly to an Earley set. Every token has a start earleme and an end earleme.

The token stream model may also be called the token-per-earleme model. In the token stream model, token location and earleme location directly correspond on an one-to-one basis. It can be useful to have the structure of the input relate to earleme location in other ways. One such alternative, which is useful and been tested, is the character-per-earleme model, discussed below.

INTERFACE

Alternative models are implemented using the optional third and fourth parameters of the token descriptors. Token descriptors are used in the arguments to the Marpa Recognizer's tokens method.

Token Length

Token length is the optional third element of the token descriptor. By default, it is 1, which is the correct value for the token stream model. Its value can be any integer greater than zero. Marpa does not allow zero length tokens in any input model.

Token Offset

The token offset is the fourth element of each token's descriptor. This is the offset to be added to the current earleme location. The current earleme location is the earleme which will be the start earleme of the next token to be added. When parsing begins, the current earleme location is earleme 0.

When token offset is left undefined, it defaults to 1. That means the next token will be added at another earleme location, one after the current one. This is the correct value for the token stream model.

Negative token offsets are not allowed. Zero token offset are allowed, and will cause multiple tokens to start at the same location. This in turn will cause lexing to be ambiguous. Marpa supports ambiguous lexing.

AMBIGUOUS LEXING

Ambiguous lexing occurs when several different sequences of tokens are possible. Potentially ambiguous lexing occurs in any parse where multiple tokens start at a single earleme. An actual ambiguity only occurs if more than one of the potential token choices is consistent with the grammar and its input. If there is no actual ambiguity, Marpa will use the only token choice which is consistent with the grammar and its inputs.

When lexing is actually ambiguous, Marpa will use all consistent alternatives. When the lexing in a parse is actually ambiguous, the parse will be ambiguous. From the point of view of Marpa's semantics, ambiguity caused by lexing is exactly the same as ambiguity caused by an ambiguous grammar.

It's a bit of a pedantic point, but in the rest of this documentation I follow the standard terminology in saying that grammar is ambiguous if it can produce more than one parse result, and vice versa. In Marpa this is not strictly true. An unambiguous grammar can produce than one parse in Marpa, if the lexing is ambiguous. Conversely, if Marpa returns more than one parse result for a grammar, you cannot conclude that the grammar itself is ambiguous.

THE CHARACTER-PER-EARLEME MODEL

Not yet written.

OTHER INPUT MODELS

Not yet written.

LICENSE AND COPYRIGHT

Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.