Name

Marpa::R2::BNF - BNF Interface

Synopsis

my $grammar = Marpa::R2::Grammar->new(
    {   start          => 'Script',
        actions        => 'My_Actions',
        default_action => 'do_first_arg',
        rules          => [ <<'END_OF_RULES' ]
Script ::= Expression+ separator => <op_comma> action => do_script
Expression ::=
    Number
    | op_lparen Expression op_rparen action => do_parens assoc => group
   || Expression op_pow Expression action => do_pow assoc => right
   || Expression op_times Expression action => do_multiply
    | Expression op_divide Expression action => do_divide
   || Expression op_add Expression action => do_add
    | Expression op_subtract Expression action => do_subtract
END_OF_RULES
    }
);

This interface is beta

This interface is beta. It is new, somewhat experimental, and therefore subject to change. Users who prize stability should prefer the hash and/or array rule descriptors.

Overview

This page is the reference for the Marpa's "BNF interface", also called its "Stuifzand interface". The Stuifzand interface allows strings as rule descriptors in the rules named arguments of Marpa grammars. These strings are called BNF source strings or Stuifzand source strings.

Stuifzand source strings can, and commonly do, describe multiple rules. Stuifzand source strings can be intermixed with the more established hash ("long form") and array ("short form") rule descriptors.

Users of the rule() method of Marpa::R2's grammars should be aware that the Stuifzand interface sometimes rewrites symbol names, and the names as returned by the rule() method reflect those rewrites. One example of a case where a rule is rewritten is when precedence parsing is implemented. Users who want to see symbol names as originally specified in the Stuifzand source should use the the bnf_rule() method instead.

The structure of the BNF source file

Whitespace separates tokens, but is otherwise ignored. Comments are equivalent to whitespace. A hash ("#") character starts a comment, which continues to the end of the line.

The Stuifzand format consists of production declarations. Each production declaraton consists of a left hand side symbol, followed by the declaration operator ("::=") and a right hand side. Right hand sides are described below.

Symbol names

Symbols names must consist entirely of Perl word characters (alphanumerics, plus the underscore). Optionally, symbol names can be enclosed in angle brackets.

Right hand sides

A right hand side may be empty, quantified, or prioritized. A empty right hand side is what the name suggests, no symbols, modifiers or punctuation. A production with an empry right hand side defines an nulling rule.

Quantified right hand sides

A quantified right hand side defines a sequence rule. It consists of a single symbol, followed by a "quantifier", and then followed, optionally, by modifiers. The quantifer is either a star ("*"), or a plus sign ("+") indicating, respectively, that the sequence rule has a minimum length of 0 or 1.

Prioritized right hand sides

All productions which are not empty or quantified, are considered prioritized, although the prioritization may be trivial, consisting only of a single right hand side alternative, at a single priority level. Each prioritized right hand side consists of one or more right hand alternatives ("alternatives" for brevity). An alternative is a series of one or more symbols, followed by zero or more modifiers.

The alternatives are separated by single or double "or" symbols. This means the alternatives in a prioritized right hand side proceed from tightest (highest) priority to loosest. The double "or" symbol ("||") is the "loosen" operator -- the alternatives after it have a looser (lower) priority than the alternatives before it. The single "or" symbol ("|") is the ordinary "alternative" operator -- alternatives on each side of it have the same priority. Associativity is specified using modifiers, as described below.

Modifiers

Modifiers are allowed for quantified productions, and for the right hand side alternatives in prioritized productions. Modifiers must occur after the symbols. Modifiers consist of a keyword, the modifier operator ("=>"), and the modifier's value. The keyword must be from the following list of reserved names. Values are as described for each keyword.

action

The value of the action keyword must be a Perl word (a string of alphanumerics or underscores). It will be interpreted as the action name, as described for the action named argument of rule descriptors.

assoc

The assoc modifier is only valid in a prioritized production. Its value must be one of left, right or group. Its effect will be as described below.

separator

The separator keyword is only valid for a quantified right side, and its value must be a symbol of the grammar. It will be used as the separator, as described for sequence rules.

How Marpa precedence works

Marpa's precedence is a generalization beyond the traditional ideas of precedence. Traditional precedence parsing required the the classification of operators as postfix, infix, etc. Marpa's precedence parsing is NOT based on the special treatment of operators. Marpa's precedence parsing, in fact, has no concept or definition of operator, except as something that is not an operand.

An occurrence of the left hand side in a right hand side alternative is called an operand. Alternatives are assigned an arity according to the number of operands they contain. All arities are allowed, from zero to an arbitrary number imposed by system limits.

An alternative with arity 0 is nullary. Precedence and associativity are meaningless in this case and will be ignored.

An alternative with arity 1 is unary. Precedence will have an effect, but left and right associativity will not.

An alternative with arity 2 is binary. Precedence will have an effect, and left and right associativity will behave in the traditional way, which is exactly as described for the N-ary case. For left associativity, where N is two or greater, only the leftmost operand associates -- operands after the first must be at a tighter priority level. For right associativity, where N is two or greater, only the rightmost operand associates -- all operands except the last must be at a tighter priority level.

Marpa also allows "group" associativity. In "group" associativity, all operands associate at the lowest priority. That is, in an alternative with group associativity, each operand may be a full expression of the kind defined by the prioritized production. "Group" associativity is used, for example, in implementing the traditional function of parentheses in Marpa. Group associativity is meaningless for nullary alternatives, and is ignored.

Caveats

Marpa's generalization of precedence works for all grammars that can be defined by prioritized productions. It is efficient for any grammars that could be parsed by the traditional precedence parsing methods. Marpa also allows you to defined alternatives not allowed by traditional methods. Many of these are useful, and most of those can also be parsed efficiently.

However, because of the many forms of recursion allowed, it is possible to define highly ambiguous grammars using the precedence mechanism. This can occur even by accident.

The user should especially be careful with right hand side alternatives in which all the symbols are operands. These can be useful. For example, an implicit operation can be defined using an binary alternative with no non-operands, and this could implement, for example, the standard notation for concatenation or multiplication. But to do this efficiently requires either avoiding ambiguity, or controlling its use carefully.

Marpa does catch the case where an alternative consists only of a single operand -- a "unit rule". This causes a fatal error. Unit rules are easy to define by accident in the Stuifzand interface, and the author knows of no practical use for them. Note that, in the event someone can find a use for such rules, Marpa can parse a grammar which includes them, via its other interfaces.

Copyright and License

Copyright 2012 Jeffrey Kegler
This file is part of Marpa::R2.  Marpa::R2 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.

Marpa::R2 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser
General Public License along with Marpa::R2.  If not, see
http://www.gnu.org/licenses/.