NAME

Parse::Marpa::Doc::Plumbing - The Plumbing Interface

DESCRIPTION

This document describes Marpa's plumbing Interface. The plumbing is the low-level interface used by all the porcelain interfaces. The plumbing can be used directly. It is a short list of named arguments to the Parse::Marpa::Grammar::new(), Parse::Marpa::Grammar::set(), and Parse::Marpa::Recognizer::new() methods.

The start argument may be used in combination with a porcelain interface, subject to the symbol name conversion requirements described below. Other than that, plumbing and porcelain interfaces cannot be used to build the same grammar. Marpa throws an exception if the user attempts to use any of the plumbing's other named arguments with a porcelain interface.

Plumbing Symbol Names

Each interface has its own rules for symbol names. The plumbing's conventions are designed to allow flexibility for the porcelain. Any valid Perl string not ending in a right square bracket is an acceptable plumbing symbol name. Plumbing symbol names which end in right square brackets are reserved for Marpa internal use.

Unlike MDL, plumbing symbols are not considered identical unless their names match exactly. Unless stated otherwise, any reference to a symbol name in this document means a plumbing symbol name.

METHOD

Parse::Marpa::Grammar::get_symbol

my $a = $grammar->get_symbol('a');

Given a symbol's plumbing name, returns the symbol's cookie. It returns undefined if a symbol with that name doesn't exist. If you are using MDL to define your grammar, you want to use Parse::Marpa::MDL::get_symbol instead.

Symbol cookies are used primarily in calls to the Parse::Marpa::Recognizer::earleme method. To get the cookie for a symbol using its porcelain name, see the documentation for the individual porcelain interface.

NAMED ARGUMENTS

The rules Named Argument

The rules named argument is available with both the Parse::Marpa::Grammar::new and Parse::Marpa::Grammar::set methods. The rules named argument may be specified multiple times, adding new rules to the grammar each time. New rules may be added until the grammar is precomputed.

The value of the rules named argument must be a reference to an array, and each element of the array must be a reference to a description of a rule. Rule descriptions can be either arrays (the short form) or hashes (the long form).

Short Form

The short form description of a rule is an array with 4 elements: lhs, rhs, action and priority. The last two of these are optional.

The lhs element must be the name of the left hand side symbol. The rhs element must be a reference to an array of names of right hand side symbol names. In the case of an empty rule, rhs must be a reference to a zero length array.

The action element, if present, must be a string describing the rule's action in the current Marpa semantics. Right now, the only available semantics is Perl 5. If the action for a rule is not explicitly set, it will be the value of Marpa's default_action option.

The priority element, if present, must be an integer. It can be negative. It will be the priority of the rule. If undefined, priority defaults to zero.

Long Form

The long form description of a rule is a hash of rule options, with the option names as the hash keys, and the option values as the hash values. The available rule options are:

lhs, rhs, action, and priority

The values of the lhs, rhs, action, and priority rule options are as described above for the corresponding elements of the short form.

min

min must be undefined, 0 or 1. If min is 0 or 1, the rule is a sequence production. If min is undefined, the rule is an ordinary, BNF production.

Only one symbol is allowed on the right hand side of a sequence production, and the right hand side symbol may not be a nullable symbol. The input will be required to match the rhs symbol at least min times and will be allowed to match an unlimited number of times. For an introduction to sequence productions, see the MDL document.

separator

Any sequence production may have a separator defined. The value must be a symbol name. Marpa allows trailing separators, Perl style. The separator must not be a nullable symbol.

Duplicate Rules

Marpa throws an exception if a duplicate rule is added. For BNF productions, a rule is considered a duplicate if it has the same left hand side symbol, and the same symbols in the same order on the right hand side. For sequences, a rule is considered a duplicate if it has the same left hand symbol, the same right hand side symbol, and the same separator.

The terminals Named Argument

The value of the terminals name argument must be a reference to an array of terminal descriptions. Terminal descriptions can be short form or long form. The short form is very short: it is the symbol name of the terminal as a scalar string.

A long form terminal description is a reference to an array of two elements. The first element is the symbol name of the terminal. The second element must be a reference to a hash of terminal options, with option names as hash keys and option values as hash values.

Terminal Options

regex

The value of the regex terminal option must be a regular expression. It is used when Marpa is asked to match the terminals in the input text. When the tokens are supplied directly, for example when using the earleme command, the terminal's regex value is ignored. Only one of the regex and action terminal options may be specified. See the MDL document for details on writing terminal regexes.

action

The value of the action terminal option must be a string with code in the current semantics. Right now the only available semantics is Perl 5. The code will be interpreted as a lex action, which will be used to match the terminal in the input text. When the tokens are supplied directly, for example when using the earleme command, the terminal's action value is ignored. Only one of the regex and action terminal options may be specified. See the MDL document for details on writing lex actions.

prefix

The value of the prefix terminal option must be a regular expression. It will be used to match and discard text from the input before any attempt is made to match the terminal itself. The most common use is to discard leading whitespace. When the tokens are supplied directly, for example when using the earleme command, the terminal's prefix value is ignored.

priority

The value of the priority terminal option must be an integer. It can be negative. It will control the order in which terminal matches are attempted.

The start Named Argument

The value of the start named argument must be a plumbing symbol name. It will be used as the start symbol for the grammar. Most of the plumbing named arguments may not be used in combination with a porcelain interface. The start named argument is an exception. It may be used to set the default for, or to override the choice of, the start symbol in the porcelain.

If you use the start named argument to specify a porcelain symbol, you must be careful to use the plumbing symbol name. The documentation for the porcelain should describe how to convert porcelain symbol names to plumbing symbol names.

SUPPORT

See the support section in the main module.

AUTHOR

Jeffrey Kegler

LICENSE AND COPYRIGHT

Copyright 2007 - 2009 Jeffrey Kegler

This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0.