NAME

MarpaX::ESLIF::BNF - MarpaX::ESLIF's BNF

VERSION

version 1.0.53

DESCRIPTION

MarpaX::ESLIF is a Scanless Interface expressed in a BNF format, that is using marpaWrapper, itself being a thin interface on top of libmarpa parser.

CONVENTIONS

The MarpaX::ESLIF BNF is composed of unicode characters, in any encoding supported by the underlying convertor (ICU or iconv, in order of preference). Unsignificant whitespaces, Perl-like comments and C++-like comments are discarded.

Symbol names

They consist of bare names, or can be enclosed in angle brackets if whitespace if desired. They are case sensitive, and can be composed only of ASCII characters. There is no attempt to discard any leading, trailing, or repeated whitespace in angle brackets version, i.e. all the followings are different symbol names:

this
<this >
< this
  >

Levels

The grammar can contain multiple levels, the level syntax being:

::=         # Alias for level 0
  ~         # Alias for level 1
:[\d]+:=    # General form

The level 0 must exist. We will use only ::= and/or ~ in the rest of this document for conveniene, though we are adressing any possible level.

Terminals

There are three types of explicit terminals.

Strings

They can be single or double-quoted, the content is any valid unicode character, and the \ character can be used to escape the quote or \ itself. The :i modifier can be used to force case-insensitive match:
```
'string'
'string':i
'string\'s'
"string\"s\\"
```
Character classes

They are always enclosed with left and right brackets []. Modifiers can start after a : character. A character class class is nothing else but a lexically restricted regular expression.
Regular expression

They are always enclosed within slashes //, and the content must be valid as per the PCRE2 Perl Compatible Regular Expression library. Modifiers can start after the slash on the right. Regular expression patterns are by default anchored. The slash character itself must be preceeded by a backslash, i.e. \/ in the string seen by the parser (so, in practice, it is coded like this: "\\/").

The PCRE2 syntax is supported in its entirety, this include any PCRE2 add-on. Character classes and regular expression share the same set of modifiers, executed in order of appearance, that are:

----------------------------------------------------------------
Modifiers   Explanation
----------------------------------------------------------------
e           Unset back-references in the pattern will match to empty strings
i           Case-insensitive
j           \u, \U and \x and unset back-references will act as JavaScript standard
m           Multi-line regex
n           Enable Unicode properties and extend meaning of meta-characters
s           A dot meta-character in the pattern matches all characters, including newlines
x           Enable comments. This has some limitation due MarpaX::ESLIF semantics
D           A dollar meta-character matches only at the end of the subject string
J           Allow duplicate names for sub-patterns
U           Inverts the "greediness" of the quantifiers
a           Meta-characters will be limited to their ASCII equivalent
u           Forces support of large codepoints
b           Could mean "forced binary" mode
c           Could mean "forced unicode character" mode
A           Remove the systematic anchoring
----------------------------------------------------------------

Internally this correspond to this set of options in PCRE2:

----------------------------------------------------------------
Modifiers         PCRE2 flag unset   PCR2 flag set
----------------------------------------------------------------
e                                    PCRE2_MATCH_UNSET_BACKREF
i                                    PCRE2_CASELESS
j                                    PCRE2_ALT_BSUX|PCRE2_MATCH_UNSET_BACKREF
m                                    PCRE2_MULTILINE
n                                    PCRE2_UCP
s                                    PCRE2_DOTALL
x                                    PCRE2_EXTENDED
D                                    PCRE2_DOLLAR_ENDONLY
J                                    PCRE2_DUPNAMES
U                                    PCRE2_UNGREEDY
a                 PCRE2_UTF
N                 PCRE2_UCP
u                                    PCRE2_UTF
b                 PCRE2_UTF          PCRE2_NEVER_UTF
c                 PCRE2_NEVER_UTF    PCRE2_UTF
A                 PCRE2_ANCHORED
----------------------------------------------------------------

Lexemes

Lexemes are meta-symbols that does appear as the LHS symbol anywhere within the current grammar. Therefore they behave like terminals, except that their definition is not in the current grammar. By default such meta-symbol is looked up at the next level. For example:

rule      ::= something
something   ~ [\d]

say that symbol something at grammar level 0 is a reference to something at grammar level 1.

Discard

Everytime expected terminals cannot be match, MarpaX::ESLIF will try to match the special rule :discard. The :discard rule also have precedence if it matches longer than the longest acceptable lexeme. and can not be ambiguous (else discard silently fail).

Grammar meta settings

Start rule

By default, the first symbol of a grammar of level n is its start symbol. This can be set once with e.g.:

:start ::= symbolname

Grammar description

By default, a grammar of level n has the description Grammar level n. This can be set once with e.g.:

:desc ::= 'A single-quoted string'

Defaults

By default, symbol action is ::shift and rule action is ::concat, i.e. the parse tree value of a grammar is a concatenation of every matched input, without the eventual discard. Stack manipulation may require the trigger of a free function, and this has no default. Only expected terminals or lexemes are looked up, this is the Longest Acceptable Token Match (LATM) setting, defaulting to a true value. You should not change that. Defaults can be set once, for example like this:

:default ::= action        => defaultRuleAction
             latm          => 1
             symbol-action => defaultSymbolAction
             free-action   => defaultFreeAction

Predefined actions are available for rules and symbols. Please refer the API documentation to know more about value types.

The free-action adverb is required when the end-user is pushing an opaque value of type PTR or ARRAY that is not declared shallowed. The internal stack manipulation may decide it does not need this value anymore: the free action is then called. This is used in low-level programming, in practice interface to high-level languages like Perl, Java etc... should hide that, making the presence of this keyword in the grammar meaningless.

The symbol-action adverb is rather dangerous, since it is changing the meaning of a lexeme in the current grammar (but not in the eventual sub-grammars): in high-level languages, where it is in principle not possible to push an ARRAY (i.e. a byte array) value type, then the rule action ::concat, that is to concatenate the RHSs (this is the ::concat reserved keyword), will silently drop these lexemes, unless there is a single RHS since then rule's ::concat is moved to ::shift that is guaranteed to not change the nature of the value being transfered.

::undef

Put a value of type UNDEF.

::ascii

Convert to ASCII charset the concatenation of all RHS's value. Only values of type ARRAY are accepted. ARRAY values with a null pointer or a length <= 0 are ignored. The value type is PTR, guaranteed to never be NULL, and always NUL terminated. If there was nothing to concatenate this will be the empty string "".

::translit

Same as ::ascii though translate whenever possible, ignoring untranslatable sequences of bytes (think to iconv's //ASCII//TRANSLIT//IGNORE option).

::concat

Concatenate all RHS's values. Only values of type ARRAY are accepted. ARRAY values with a length greater than 0 are concatenated. Please note that for symbols, ::concat is allowed but in reality is not doing any concatenation, it is doing a ::shift, meaning that the type of the result is unchanged. This is because formally symbol values come directly from the input, as a single entity. There is no notion of concatenation here. In addition, the user can change the value of a symbol using the symbol-action adverb, eventually. Then the notion of concatenation, even applied on a single object, is not correct - the real notion is a copy.

The same logic applies on a rule that have a ::concat action: if the rule contain a single RHS, then ::concat is a synonym for ::shift, that is guaranteed to not change the nature of the RHS. But if there are more than one RHS, then ::concat behaviour will drop anything coming from the userspace.

The result is always of ARRAY type. If something was found during concatenation, the area pointed by the ARRAY is guaranteed to be NUL terminated, even if the exposed size does not contain the extra byte for this NUL character. In the extreme case where nothing can be concatenated, then ::concat result can be equivalent to ::undef.

Take care, using the symbol-action to change the meaning of a lexeme has an impact on the default ::concat action, that is likely to ignore user-defined lexemes unless you pushed an ARRAY ESLIF value type, that is in principle not possible with high-level languages.

::concat is the default action for rules.

::copy[x]

Copies the RHS number x (first RHS is at indice 0), putting UNDEF if it does not exist.

::shift

Alias for ::copy[0].

This is also the default action for symbols.

Discard

The :discard symbol, despite belonging to a given grammar, is not accessible directly, and can only be set as a meta setting. An event can be associated upon discard completion, there can be multiple :discard statements:

:discard ::= symbolname1 event => discard_symbolname1$
:discard ::= symbolname2 event => discard_symbolname2$

Note than when an event is set, this will be triggered only on the :discard's RHS completion, therefore the RHS of the :discard must be an LHS in the same grammar when there is an event setting.

Events

Event names

They are composed of a restricted set of the ASCII graph characters. The name ':symbol' is restricted and is transformed to the symbol name for which the event is triggered.
Event initializers

By default, events are on, this is equivalent to appending =on after the event name. The =off characters are putting event off at startup.

Lexemes are different than non-lexeme symbols because they are treated in the grammar are terminals, others are not.

Lexeme events

Meta symbols that are lexemes can have pause events, before mean that the scanning recognized them, after mean they have been consumed, e.g.:

:lexeme ::= symbolname1 pause => before event => ^symbolname1
:lexeme ::= symbolname2 pause => after  event =>  symbolname2$2

It is not allowed to set a lexeme event on a symbol that is not a lexeme.

Non-lexeme events

Completion, predicted or nulled events are supported, targetting a symbol name.

For example:

event a     = completed  symbolname
event b=off = nulled     symbolname
event c=on  = predicted ^symbolname

It is not allowed to a a non-lexeme event on a symbol that is a lexeme.

Autoranking

Rules can be autoranked, the higest of a set of alternative having the highest rank, default is off:

autorank is on by default
autorank is off by default

Inaccessible statements

Inaccessible statements can generate warnings, can be ignored, or be error on demand, default is to ignore them:

inaccessible is warn by default
inaccessible is ok by default
inaccessible is fatal by default

Statements

A statement have a symbol name on the left-hand side (LHS) and zero or more symbol names, or terminals, on the right-hand side (RHS):

LHS ::= RHS1 RHS2 etc...

There are two exceptions:

The exception statement

Its semantic is a single symbol name following by another single symbol name, with - in the middle:

LHS ::= RHS1 - RHS2

The sequence statement

This is a single symbol name following by the * or the + character:

LHS1 ::= RHS1*
LHS2 ::= RHS2+

Empty rule have no RHS:

EMPTYRULE ::=

Eventual ambiguities in the grammar itself may be solved by adding the ; character at the end of a rule, or by enclosing zero or more statements within { and } characters:

EMPTYRULE ::= ;
{
  LHS1 ::= RHS1
  LHS2 ::= RHS2 - RHS3
}

Alternatives

There are two types of alternatives: the standard | meaning this is an or, or the loosen character || meaning that this is an alternative starting a prioritized group of alternatives, for example the calculator grammar is:

Expression  ::=  /[\d]+/
              | '(' Expression ')'              assoc => group
              ||    Expression '**' Expression  assoc => right
              ||    Expression  '*' Expression
              |     Expression  '/' Expression
              ||    Expression  '+' Expression
              |     Expression  '-' Expression

which is strictly equivalent, in traditional BNF syntax to:

Expression  ::= Expression0
Expression0 ::= Expression1
Expression1 ::= Expression2
Expression2 ::= Expression3

Expression3 ::= /[\d]+/
              | '(' Expression0 ')'
Expression2 ::=  Expression3 '**' Expression2
Expression1 ::=  Expression1  '*' Expression2
              |  Expression1  '/' Expression2
Expression0 ::=  Expression0  '+' Expression1
Expression0 ::=  Expression0  '-' Expression1

As you can see statements has been grouped at every occurence of || operator. Therefore the loosen operator || is a convenience operator, it is always possible to write an equivalent grammar without it, though this can become quite tedious. The assoc adverb has a meaning only in the presence of prioritized alternatives, else it has no effect.

The following is copied almost verbatim from the Marpa::R2 section on precedence:

In prioritized statements, every alternative has an arity. The arity is the number of times an operand appears on the RHS. A RHS symbol is an operand if and only if it is the same as the LHS symbol. Anything else is considered as an operator. When the arity is 0, precedence and associativy are meaningless and ignored. When the arity is 1, precedence has effect, but not left nor right associativity.

If arity is 2 or more and the alternative is left associative, the leftmost operand associates and operands after the first will have the next-tighest priority level. If arity is 2 or more and the alternative is right associative, the last operand associates and operands before the last will have the next-tighest priority level. In group associativity, all operands associate at the lowest priority.

Adverbs

Any rule can be followed by zero or more of these adverbs, if an adverb appears more than once, the latest is the winner:

Action

During valuation, a specific action can be associated to a rule:

action => my_action

Left association

In a prioritized statement, associate with the left-most operand:

assoc => left

Right association

In a prioritized statement, associate with the right-most operand:

assoc => right

Group association

All operands associate at the lowest priority:

assoc => group

Separator

Sequence rules can have a separator, that can be a symbol name, a string, a character class or a regular expression.

separator => comma
separator => ','
separator => [,]
separator => /,/

Modifiers are allowed after string, character class or regular expressions.

Proper specification

Sequence rules can be proper, i.e. without trailing separator:

proper => 1

Rank specification

During valuation, rules can have a rank to get prioritized. Rank is a signed integer and default to 0:

rank => -2

Any other value but 0 is not allowed if autoranking is set to a true value.

Null-ranking specification

Nulling symbols can rank high low, the default is low.

null-ranking => 'low'
null-ranking => 'high'

Priority specification

Lexemes can be prioritized, using a signed integer:

priority => 15

Pause specification

Scanner can be paused before a lexeme is recognized, or just after it has been completed:

pause => before
pause => after

Event specification

Events can be specified, with an eventual initializer, given that default initialization is =on:

event => eventName
event => eventName=on
event => eventName=off

Naming

A name can be associated to the rule, in the form:

name => something
name => 'quoted name'  # No modifier is allowed after the string
name => "quoted name"  # No modifier is allowed after the string

NAME

BNF

MarpaX::ESLIF BNF can be expressed in itself:

/*
 * **********************
 * Meta-grammar settings:
 * **********************
 */
:discard                       ::= whitespace
:discard                       ::= <perl comment> 
:discard                       ::= <cplusplus comment> 

/*
 * ******
 * Rules:
 * ******
 */
<statements>                   ::= statement*

<statement>                    ::= <start rule>
                                 | <desc rule>
                                 | <empty rule>
                                 | <null statement>
                                 | <statement group>
                                 | <priority rule>
                                 | <quantified rule>
                                 | <discard rule>
                                 | <default rule>
                                 | <lexeme rule>
                                 | <completion event declaration>
                                 | <nulled event declaration>
                                 | <prediction event declaration>
                                 | <inaccessible statement>
                                 | <exception statement>
                                 | <autorank statement>

<start rule>                   ::= ':start' <op declare> symbol
<desc rule>                    ::= ':desc' <op declare> <quoted name>
<empty rule>                   ::= <lhs> <op declare> <adverb list>
<null statement>               ::= ';'
<statement group>              ::= '{' statements '}'
<priority rule>                ::= lhs <op declare> <priorities>
<quantified rule>              ::= lhs <op declare> <rhs primary> <quantifier> <adverb list>
<discard rule>                 ::= ':discard' <op declare> <rhs primary> <adverb list>
<default rule>                 ::= ':default' <op declare> <adverb list>
<lexeme rule>                  ::= ':lexeme' <op declare> symbol <adverb list>
<completion event declaration> ::= 'event' <event initialization> '=' 'completed' <symbol name>
                                 | 'event' <event initialization> <op declare> 'completed' <symbol name>
<nulled event declaration>     ::= 'event' <event initialization> '=' 'nulled' <symbol name>
                                 | 'event' <event initialization> <op declare> 'nulled' <symbol name>
<prediction event declaration> ::= 'event' <event initialization> '=' 'predicted' <symbol name>
                                 | 'event' <event initialization> <op declare> 'predicted' <symbol name>
<inaccessible statement>       ::= 'inaccessible' 'is' <inaccessible treatment> 'by' 'default'
<inaccessible treatment>       ::= 'warn'
<inaccessible treatment>       ::= 'ok'
<inaccessible treatment>       ::= 'fatal'
<exception statement>          ::= lhs <op declare> <rhs primary> '-' <rhs primary> <adverb list>
<autorank statement>           ::= 'autorank' 'is' <on or off> 'by' 'default'
<op declare>                   ::= <op declare top grammar>
                                 | <op declare lex grammar>
                                 | <op declare any grammar>
<priorities>                   ::= <alternatives>+ separator => <op loosen> proper => 1
<alternatives>                 ::= <alternative>+ separator => <op equal priority> proper => 1
<alternative>                  ::= rhs <adverb list>
<adverb list>                  ::= <adverb list items>
<adverb list items>            ::= <adverb item>*
<adverb item>                  ::= <action>
                                 | <left association>
                                 | <right association>
                                 | <group association>
                                 | <separator specification>
                                 | <proper specification>
                                 | <rank specification>
                                 | <null ranking specification>
                                 | <priority specification>
                                 | <pause specification>
                                 | <latm specification>
                                 | naming
                                 | <null adverb>
                                 | <symbol action>
                                 | <free action>
                                 | <event specification>
<action>                       ::= 'action' '=>' <action name>
<left association>             ::= 'assoc' '=>' 'left'
<right association>            ::= 'assoc' '=>' 'right'
<group association>            ::= 'assoc' '=>' 'group'
<separator specification>      ::= 'separator' '=>' <single symbol>
<proper specification>         ::= 'proper' '=>' false
                                 | 'proper' '=>' true
<rank specification>           ::= 'rank' '=>' <signed integer>
<null ranking specification>   ::= 'null-ranking' '=>' <null ranking constant>
                                 | 'null' 'rank' '=>' <null ranking constant>
<null ranking constant>        ::= 'low'
                                 | 'high'
<priority specification>       ::= 'priority' '=>' <signed integer>
<pause specification>          ::= 'pause' '=>' 'before'
                                 | 'pause' '=>' 'after'
<event specification>          ::= 'event' '=>' <event initialization>
<event initialization>         ::= <event name> <event initializer>
<event initializer>            ::= '=' <on or off>
<event initializer>            ::=
<on or off>                    ::= 'on'
                                 | 'off'
<latm specification>           ::= 'latm' '=>' <false>
                                 | 'latm' '=>' <true>
naming                         ::= 'name' '=>' <alternative name>
<null adverb>                  ::= ','
<symbol action>                ::= 'symbol-action' '=>' <action name>
<free action>                  ::= 'free-action' '=>' <free name>
<alternative name>             ::= <standard name>
<alternative name>             ::= <quoted name>
<event name>                   ::= <restricted ascii graph name>
                                 | ':symbol'
lhs                            ::= <symbol name>
rhs                            ::= <rhs primary>+
<rhs primary>                  ::= <single symbol>
                                 | <symbol name> '@' <grammar reference>
<single symbol>                ::= symbol
                                 | <character class>
                                 | <regular expression>
                                 | <quoted string>
symbol                         ::= <symbol name>
<symbol name>                  ::= <bare name>
                                 | <bracketed name>
<action name>                  ::= <restricted ascii graph name>
                                 | '::shift'
                                 | '::undef'
                                 | '::ascii'
                                 | '::translit'
                                 | '::concat'
                                 | /::copy\[\d+\]/
<free name>                    ::= <restricted ascii graph name>
<quantifier>                   ::= '*'
                                 | '+'
<signed integer>               ::= /[+-]?\d+/
<grammar reference>            ::= <quoted string>
                                 | <signed integer>

#
# ---------------------------------------
# Lexemes of the grammar given above are:
# ---------------------------------------
#
# <op declare any grammar> ::= <op declare any grammar>@+1
# <op declare top grammar> ::= <op declare top grammar>@+1
# <op declare lex grammar> ::= <op declare lex grammar>@+1
# <op loosen> ::= <op loosen>@+1
# <op equal priority> ::= <op equal priority>@+1
# <false> ::= <false>@+1
# <true> ::= <true>@+1
# <standard name> ::= <standard name>@+1
# <quoted name> ::= <quoted name>@+1
# <quoted string> ::= <quoted string>@+1
# <character class> ::= <character class>@+1
# <regular expression> ::= <regular expression>@+1
# <bare name> ::= <bare name>@+1
# <bracketed name> ::= <bracketed name>@+1
# <restricted ascii graph name> ::= <restricted ascii graph name>@+1
# <whitespace> ::= <whitespace>@+1
# <perl comment> ::= <perl comment>@+1
# <cplusplus comment> ::= <cplusplus comment>@+1
#

/*
 * *************
 * Lexeme rules:
 * *************
 */
whitespace                       ~ /[\s]+/
<perl comment>                   ~ /(?:(?:#)(?:[^\n]*)(?:\n|\z))/u
<cplusplus comment>              ~ /(?:(?:(?:\/\/)(?:[^\n]*)(?:\n|\z))|(?:(?:\/\*)(?:(?:[^\*]+|\*(?!\/))*)(?:\*\/)))/u
<op declare any grammar>         ~ /:\[(\d+)\]:=/
<op declare top grammar>         ~ '::='
<op declare lex grammar>         ~ '~'
<op loosen>                      ~ '||'
<op equal priority>              ~ '|'
<true>                           ~ '1'
<false>                          ~ '0'
<word character>                 ~ [\w]
<one or more word characters>    ~ <word character>+ proper => 1
<zero or more word characters>   ~ <word character>* proper => 1
<restricted ascii graph name>    ~ /[-!#$%&()*+.\/;<>?@\[\\\]^_`|~A-Za-z0-9][-!#$%&()*+.\/:;<>?@\[\\\]^_`|~A-Za-z0-9]*/
<bare name>                      ~ <word character>+ proper => 1
<standard name>                  ~ [a-zA-Z] <zero or more word characters>
<bracketed name>                 ~ '<' <bracketed name string> '>'
<bracketed name string>          ~ /[\s\w]+/
<quoted string>                  ~ /(?:(?|(?:')(?:[^\\']*(?:\\.[^\\']*)*)(?:')|(?:")(?:[^\\"]*(?:\\.[^\\"]*)*)(?:")))/su
                                 | /(?:(?|(?:')(?:[^\\']*(?:\\.[^\\']*)*)(?:')|(?:")(?:[^\\"]*(?:\\.[^\\"]*)*)(?:")))/su ':' /ic?/
<quoted name>                    ~ /(?:(?|(?:')(?:[^\\']*(?:\\.[^\\']*)*)(?:')|(?:")(?:[^\\"]*(?:\\.[^\\"]*)*)(?:")))/su
<character class>                ~ /((?:\[(?:(?>[^\[\]]+)|(?-1))*\]))/
                                 | /((?:\[(?:(?>[^\[\]]+)|(?-1))*\]))/ ':' /[eijmnsxDJUuaNbcA]+/
<regular expression>             ~ /(?:(?|(?:\/)(?:[^\\\/]*(?:\\.[^\\\/]*)*)(?:\/)))/su
                                 | /(?:(?|(?:\/)(?:[^\\\/]*(?:\\.[^\\\/]*)*)(?:\/)))/su /[eijmnsxDJUuaNbcA]+/

AUTHOR

Jean-Damien Durand <jeandamiendurand@free.fr>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install MarpaX::ESLIF, copy and paste the appropriate command in to your terminal.

cpanm

cpanm MarpaX::ESLIF

CPAN shell

perl -MCPAN -e shell
install MarpaX::ESLIF

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

DESCRIPTION

CONVENTIONS

Grammar meta settings

Statements

NAME

BNF

SEE ALSO

AUTHOR

COPYRIGHT AND LICENSE

Module Install Instructions

Keyboard Shortcuts