NAME
Marpa::R2::Progress - Progress reports on your parse
About this document
This document describes the progress reports for Marpa's SLIF interface. These allow an application to know exactly where it is in the parse at any point. For parse locations of the user's choosing, progress reports list all the rules in play, and indicate the location at which the rule started, and how far into the rule parsing has progressed.
Progress reports are extremely useful in debugging grammars and the detailed example in this document is a debugging situation. Readers specifically interested in debugging a grammar should read the document on tracing problems before reading this document.
Introduction to Earley items
To read the show_progress
output, it is important to have a basic idea of what Earley items are, and of what the information in them means. Everything that the user needs to know is explained in this section.
Dotted rules
Marpa is based on Jay Earley's algorithm for parsing. The idea behind Earley's algorithm is that you can parse by building a table of rules and where you are in those rules. "Where" means two things: location in the rule relative to the rule's symbols, and location relative to the parse's input stream.
Let's look at an example of a rule in a context-free grammar. Here's the rule for assignment from the Perl distribution's perly.y
termbinop -> term ASSIGNOP term
ASSIGNOP
is perly.y
's internal name for the assignment operator. In plain Perl terms, this is the "=
" character.
In parsing this rule, we can be at any of four possible locations. One location is at the beginning, before all of the symbols. The other three locations are immediately after each of the rule's three symbols.
Within a rule, position relative to the symbols of the rule is traditionally indicated with a dot. In fact, the symbol-relative rule position is very often called the dot location. Taken as a pair, a rule and a dot location are called a dotted rule.
Here's our rule with a dot location indicated:
termbinop -> · term ASSIGNOP term
The dot location in this dotted rule is at the beginning. A dot location at the beginning of a dotted rule means that we have not recognized any symbols in the rule yet. All we are doing is predicting that the rule will occur. A dotted rule with the dot before all of its symbols is called a prediction or a predicted rule.
Here's another dotted rule:
termbinop -> term · ASSIGNOP term
In this dotted rule, we are saying we have seen a term
, but have not yet recognized an ASSIGNOP
.
There's another special kind of dotted rule, a completion. A completion (also called a completed rule) is a dotted rule with the dot after all of the symbols. Here is the completion for the rule that we have been using as an example:
termbinop -> term ASSIGNOP term ·
A completion indicates that a rule has been fully recognized.
Earley items
The dotted rules contain all but one piece of the information that Marpa needs to track. The missing piece is the second of the two "wheres": where in the input stream. To associate input stream location and dotted rules, Marpa uses what are now called Earley items.
A convenient way to think of an Earley item is as a triple, or 3-tuple, consisting of dotted rule, origin and current location. The origin is the location in the input stream where the dotted rule starts. The current location (also called the dot location) is the location in the input stream which corresponds to the dot position.
In Marpa terms, G1 location is location in terms of the G1 subgrammar's Earley sets. When the term "location" is used in this document, it means G1 location unless otherwise indicated.
A user often finds it much more convenient to think in terms of line and column position in the input stream, instead of G1 location. Every G1 location corresponds to a range of positions in the input stream. When the term "position" is used in this document, it means input stream position, unless otherwise indicated.
Two noteworthy consequences follow from the way in which origin and current G1 location are defined. First, if a dotted rule is a prediction, then origin and current location will always be the same. Second, the input stream location where a rule ends is not tracked unless the dotted rule is a completion. In other cases, an Earley item does not tell us if a rule will ever be completed, much less at which location.
The problem
For this example of debugging, I have taken a very simple prototype of a string expression calculator and deliberately introduced a problem. I've commented out one of the correct rules:
# <numeric assignment> ::= variable '=' <numeric expression>
and replaced it with a altered one:
<numeric assignment> ::= variable '=' expression
For those readers who like to look ahead (and I encourage you to be one of those readers) all of the code and outputs for this example are collected in the "Appendix".
This altered rule contains an mistake of the kind that is easy to make in actual practice. (In this case, a unlucky choice of naming conventions may have contributed.) The altered version will cause problems. In what follows, we'll pretend we don't already know where the problem is, and that in desk-checking the grammar our eye does not spot the mistake, so that we need to use the Marpa diagnostics and tracing facilities to "discover" it.
The example
The example we will use is a prototype string calculator. It's extremely simple, to make the example easy to follow. But it can be seen as a realistic example, if it is thought of as a very early stage in the incremental development of something useful.
:default ::= action => ::array bless => ::lhs
:start ::= statements
statements ::= statement *
statement ::= assignment | <numeric assignment>
assignment ::= 'set' variable 'to' expression
# This is a deliberate error in the grammar
# The next line should be:
# <numeric assignment> ::= variable '=' <numeric expression>
# I have changed the <numeric expression> to <expression> which
# will cause problems.
<numeric assignment> ::= variable '=' expression
expression ::=
variable | string
|| 'string' '(' <numeric expression> ')'
|| expression '+' expression
<numeric expression> ::=
variable | number
|| <numeric expression> '+' <numeric expression>
|| <numeric expression> '*' <numeric expression>
variable ~ [\w]+
number ~ [\d]+
string ~ ['] <string contents> [']
<string contents> ~ [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]+
:discard ~ whitespace
whitespace ~ [\s]+
At this stage of developing our string calculator, we have assignment, variables, constants, concatenation and conversion of numerics. For numerics, we have assignment, variables, constants, multiplication and addition.
We decide that, since string expressions and variables are the "default", that in the grammar we'll make the symbol names for numeric assignment and expressions explicit: <numeric expression>
and <numeric assignment>
. But since strings are the default, we decide to call our string expressions simply <expression>
, and to call our string assignments simply <assignment>
. This seems like a good idea, but it is also likely to cause confusion. For the sake of our example we will pretend that it did.
The error message
If we try the following input,
my $test_input = 'a = 8675309 + 42 * 711';
we will get this error message,
Error in SLIF parse: No lexemes accepted at line 1, column 18
Rejected lexeme #0: '*'; value="*"; length = 1
* String before error: a = 8675309 + 42\s
* The error was at line 1, column 18, and at character 0x002a '*', ...
* here: * 711
The error message indicates that Marpa rejected the "*
" operator.
The value of the parse
In debugging this issue, we'll look at the value of the parse first. The parse value differs from the other debugging aids we'll discuss. Every other debugging tool we will describe is always available, no matter how badly the parse failed. But if you have a problem parsing, you often won't get a parse value.
Our luck holds. Here's a dump of the parse value at the point of failure. It's a nice to way to see what Marpa thinks the parse was so far.
\bless( [
bless( [
bless( [
'a',
'=',
bless( [
bless( [
'8675309'
], 'My_Nodes::expression' ),
'+',
bless( [
'42'
], 'My_Nodes::expression' )
], 'My_Nodes::expression' )
], 'My_Nodes::numeric_assignment' )
], 'My_Nodes::statement' )
], 'My_Nodes::statements' );
If we were perceptive, we might spot the error here. Our parse is not quite right, and that shows up in the outer My_Nodes::expression
-- it should be My_Nodes::numeric_expression
. We'll assume that we don't notice this.
In fact, in the following, we'll pretend we haven't seen the dump of the parse value. We can't always get a parse value, so we don't want to rely on it.
Output from trace_terminals()
You can rely on getting the output from trace_terminals
, and it is a good next place to check. Typically, you will be interested in the last tokens to be accepted. Sometimes that information alone is enough to make it clear where the problem is.
The full trace_terminals
output for this example is in the Appendix. We see that the recognizer accepts the input as far as the multiplication sign ("*
"), which it rejects. In Marpa, a lexeme is "acceptable" if it fits the grammar and the input so far. A lexeme is rejected if it is not acceptable.
The last two lines of the trace_terminals
output are:
Discarded lexeme L1c17: whitespace
Rejected lexeme L1c18: '*'; value="*"
A note in passing: Marpa shows the input string position of the tokens it accepts, discard and rejects. <whitespace>
is supposed to be discarded and that was what happened at line 1, column 17. But the '*'
that was next in the input was rejected, and that was not supposed to happen.
Output from show_progress()
Marpa's most powerful tool for debugging grammars is its progress report, which shows the Earley items being worked on. In the Appendix, progress reports for the entire parse are shown. Our example in this document is a very small one, so that producing progress reports for the entire parse is a reasonable thing to do in this case. If a parse is at all large, you will usually need to be selective.
The progress report that is usually of most interest is the one for the Earley set that you were working on when the error occurred. This is called the current location. In our example the current location is G1 location 5. By default, show_progress
prints out only the progress reports for the current location.
Here are the progress reports for the current location, location 5, from our example.
F0 @0-5 L1c1-16 statements -> statement * .
P1 @5-5 L1c15-16 statement -> . assignment
P2 @5-5 L1c15-16 statement -> . <numeric assignment>
F2 @0-5 L1c1-16 statement -> <numeric assignment> .
P3 @5-5 L1c15-16 assignment -> . 'set' variable 'to' expression
P4 @5-5 L1c15-16 <numeric assignment> -> . variable '=' expression
F4 @0-5 L1c1-16 <numeric assignment> -> variable '=' expression .
F5 @2-5 L1c3-16 expression -> expression .
F7 @4-5 L1c13-16 expression -> expression .
F8 @4-5 L1c13-16 expression -> variable .
R11:1 @2-5 L1c3-16 expression -> expression . '+' expression
F11 @2-5 L1c3-16 expression -> expression '+' expression .
F19 @0-5 L1c1-16 :start -> statements .
Progress report lines
F19 @0-5 L1c1-16 :start -> statements .
The last field of each progress report line shows, in fully expanded form, the dotted rule we were working on. Prefixed to the dotted rule are three fields. In the example just above they are "F0 @0-5 L1c1-16
". The "F0
" says that this is a completed or final rule, and that it is rule number 0. The rule number is a convenient way to refer to a rule and is used when displaying the whole rule would take too much space.
The "@0-5
" describes the G1 locations of the dotted rule in the parse. In its simplest form, the location field is two G1 location numbers, separated by a hyphen. The first G1 location number is the origin, the place where Marpa first started recognizing the rule. The last G1 location number is the dot location, the G1 location of the dot in a dotted rule. "@0-3
" says that this rule began at G1 location 0, and that the dot is at G1 location 3.
Following the G1 location is the range of positions in the input string: "L1c1-16
". This indicates that the origin of dotted rule is at line 1, column 1, and that its dot position is after line 1, column 16.
The current location is also just after line 1, column 16, and at G1 location 5, and this is no coincidence. Whenever we are displaying the progress report for a G1 location, all the progress report lines will have their dot location at that G1 location.
As an aside, notice that the left hand side symbol is :start
. That is the start pseudo-symbol. The presence of a completed start rule in our progress report indicates that if our input had ended at location 5, it would be a valid sentence in the language of our grammar. (And it is because the input at G1 location 5 was a valid sentence of the grammar, that we were able to look at the value of the parse at location 5 for debugging purposes.)
Let's look at another progress report line:
R11:2 @2-4 L1c3-13 expression -> expression '+' . expression
Here the "R11:2
" indicates that this is rule number 11 (the "R
" stands for rule number) and that its dot position is after the second symbol on the right hand side. Symbol positions are numbered using the ordinal of the symbol just before the position. Symbols are numbered starting with 1, and symbol position 2 is the position immediately after symbol 2.
Predicted rules also appear in progress reports:
P2 @3-3 L1c5-11 statement -> . <numeric assignment>
Here the "P
" in the summary field means "predicted". Notice that in the predicted rule, the origin is the same as the dot location. This will always be the case with predicted rules.
OK! Now to find the bug
If we look again are progress reports at the location 5, the location where things went wrong: We see that we have completed rules for <expression>
, <numeric assignment>
, <statement>
, <statements>
, as expected. We also see two Earley items that show that we are in the process of building another <expression>
, and that it is expecting a '+
' symbol.
What we want to know is, why is the recognizer not expecting an '*
' symbol? Looking back at the grammar, we see that only one rule uses the '*
' symbol. Here it is as part of a prioritized rule in the DSL:
<numeric expression> ::=
variable | number
|| <numeric expression> '+' <numeric expression>
|| <numeric expression> '*' <numeric expression>
Here it is from the show_rules()
listing:
G1 R18 <numeric expression> ::= <numeric expression> '*' <numeric expression>
It's rule 18 in subgrammar G1, and for convenience we will call it R18. The next step is to look at the Earley items for this rule. But there is a problem. We don't find any.
Next, we ask ourselves, what is the earliest place R18 should be appearing? The answer is that there should be a prediction of R18 at location 0. So we look at the predictions at location 0.
P0 @0-0 L0c0 statements -> . statement *
P1 @0-0 L0c0 statement -> . assignment
P2 @0-0 L0c0 statement -> . <numeric assignment>
P3 @0-0 L0c0 assignment -> . 'set' variable 'to' expression
P4 @0-0 L0c0 <numeric assignment> -> . variable '=' expression
P19 @0-0 L0c0 :start -> . statements
No R18 predicted at G1 location 0. Next we look through the the entire progress report, at all G1 locations, to see if R18 is predicted anywhere. No R18. Not anywhere.
The LHS of R18 is <numeric expression>
. We look in the progress report for dotted rules where <numeric expression>
is expected -- that is, dotted rules where <numeric expression>
is the post-dot symbol. There are none.
Next we look for places in the progress reports where <numeric expression>
occurs at all, whether post-dot or not. In the progress reports, <numeric expression>
occurs in only two dotted rule instances. Here they are:
P10 @2-2 L1c3 expression -> . 'string' '(' <numeric expression> ')'
P10 @4-4 L1c13 expression -> . 'string' '(' <numeric expression> ')'
In both cases these are predictions of a string operator, the operator we plan to use for converting numerics to strings. They are just predictions, predictions which go no further because there is no 'string
' operator in our input. That's fine, but why no other, more relevant, occurrences of <numeric expression>
?
We look back at the grammar. Aside for the rule for the 'string
' operator, <numeric expression>
occurs on a RHS in two places. One is in the prioritized rule which defines <numeric expression>
.
<numeric expression> ::=
variable | number
|| <numeric expression> '+' <numeric expression>
|| <numeric expression> '*' <numeric expression>
This rule will never put <numeric expression>
into the Earley items unless there is a <numeric expression>
already there. But that is not its job. This rule is just fine and does not need fixing.
That leaves one rule to look at.
<numeric assignment> ::= variable '=' expression
This rule is one that should lead to the prediction of a new <numeric expression>
in our example. And now we see our problem. This rule is never leading to the prediction of a new <numeric expression>
, because there is no <numeric expression>
on its RHS, or for that matter anywhere else in it. On the RHS, where we wrote <expression>
, we should have written <numeric expression>
. Change that and the problem is fixed.
Complications
We have finished our main example. This section discusses some aspects of debugging which did not arise in the example, and which might be unexpected.
Empty rules
When a symbol is nulled in your parse, show_progress
show only the nulled symbol. It does not show the symbols expansion into rules, or any of its nulled child symbols. This reduces clutter, and usually one does not notice the missing nulled rules and symbols. Not showing these seems to be the intuitive way to treat them.
Input string ranges
G1 locations run in a monotonic sequence, starting with 0. G1 locations never run backwards, they are never visited twice, and they leave no gaps.
Input string positions, on the other hand, can do all of these things. An application is allowed to jump around in the input. An input string position may be encountered more than once. It is quite possible to write your application so that it encounters, for example, line 42 before line 7. And your application does not have to visit line 42 on its way from line 41 to line 43. For that matter, an application does not ever have to visit any position in its input.
How does Marpa deal with this when reporting input string ranges? Marpa always reports the minimum range that includes all the input string positions visited in the dotted rule. The range is always reported in increasing numeric order, even when the position at the end of the range was visited before the input string position at the beginning of the range. And, if necessary to include all visited input string positions, the range may include input string positions which were not visited.
Most applications move forward continuously in the input string, and if yours is one of them, you don't have to worry about these issues. But if you do unusual things when reading the input, it helps to be aware of how input string ranges are reported by Marpa when tracing and debugging.
Multiple instances of dotted rules
It does not happen in our main example for this document, but a dotted rule can appear in the same Earley set more than once. In fact, this happens frequently. When it does happen, the lines in the progress report will look like these
F11 x12 @0...38-41 L1c1-L2c40 <plain assignment> -> 'x' '=' expression .
F1 x20 @0...38-41 L1c1-L2c40 expression -> assignment .
F6 x12 @0...38-41 L1c1-L2c40 assignment -> <plain assignment> .
These are some of the progress report lines for an indirect right recursion, one that recurses from a <plain assignment>
symbol to an <expression>
symbol, and then to an <assignment>
symbol, before completing the recursion by returning to a <plain assignment>
.
In each of the three lines, notice that a new field appears second. This second field is variously "x12
" or "x20
". These are counts, indicating the number of instances of that dotted rule at the dotted rule's G1 dot location. Every dotted rule instance will have the same G1 location, but the instances may have many different origins -- hundreds or even more. In each of the three report lines above, the G1 dot location is 41.
Note that when parsing, Marpa handles the long series of Earley items generated by right recursions very efficiently. It uses a technique invented by Joop Leo to memoize and eliminate them. When a progress report is requested at a G1 location, the Leo-memoization is unfolded, and the full list of Earley items is reported.
Each instance may have its own span in the input string, and the input string range will include them all. When there are many instances of a dotted rule at a single location, the origins in the location field are shown as a range, with the earliest separated from the most recent by a "...
". For example, above, where the first four fields were "F7 x12 @0...38-41 L1c1-L2c40
", that tells us that the dotted rule is rule 7, which has 12 instances. All 12 instances have their dot location at G1 location 41, but their origins are in the range from G1 location 0 to G1 location 38.
The last field in "F7 x12 @0...38-41 L1c1-L2c40
" is an input string range. "L1c1-L2c40
" says that input string positions visited by the the 12 instances start at line 1, column 1, and end at line 2, column 40. The reported input string range will be the shortest range that includes all of the input string positions visited by any of the dotted rule instances.
If there are only a few origins, Marpa may explicitly list them all. In the follow example, there are only 2 instances of this rule, both with a dot location of 41. Their origins are at G1 locations 8 and 18. The range of input string positions is from line 1, column 17 to line 2, column 40.
F2 x2 @8,18-41 L1c17-L2c40 assignment -> <divide assignment> .
Access to the "raw" progress report information
This section deals with the progress()
recognizer method, which allows access to the raw progress report information. This method is not needed for typical debugging and tracing situations. It is intended for applications which want to leverage Marpa's "situational awareness" in innovative ways.
progress()
my $report0 = $recce->progress(0);
my $latest_report = $recce->progress();
Given the G1 location (Earley set ID) as its argument, the progress()
recognizer method returns a reference to an array of "report items". The G1 location may be given as a negative number. An argument of -X will be interpreted as G1 location N-(X+1), where N is the latest Earley set. This means that an argument of -1 indicates the latest Earley set, an argument of -2 indicates the Earley set just before the latest one, etc.
Each report item is a triple: an array of three elements. The three elements are, in order, rule ID, dot position, and origin. The data returned by the two displays above, as well as the data for the other G1 locations in our example, are shown below.
The rule ID is the same number that Marpa uses to identify rules in tracing and debugging output. Given a rule ID, an application can expand it into its LHS and RHS symbols using the SLIF grammar's rule_expand()
method. Given a symbol ID, its name and other information can be found using other SLIF grammar methods.
Dot position is -1 for completions, and 0 for predictions. Where the report item is not for a completion or a prediction, dot position is N, where N is the number of RHS symbols successfully recognized at the G1 location of the progress report.
Origin is the G1 location (Earley set ID) at which the rule application reported by the report item began. For a prediction, origin will always be the same as the G1 location of the parse report.
Progress reports and efficiency
When progress reports are used for production parsing, instead of just for debugging and tracing, efficiency considerations become significant. Progress reports themselves are implemented in optimized C, and that logic is very fast. However, the use of progress reports usually implies considerable post-processing in Perl. It is almost always possible to use Marpa's named events instead of progress reports, and solutions using named events are usually better targeted, simpler and faster.
If you do decide to use progress reports in an application, you should be aware of the efficiency considerations when there are right recursions in the grammar. For most purposes, Marpa optimizes right recursions, so that they run in linear time. However, to create a progress report every potential right recursion must be fully unfolded, and at each G1 location the number of these grows linearly with the length of the recursion. If you are creating progress reports for more than a limited number of G1 locations, this means processing that can be quadratic in the length of the recursion. When a right recursion is lengthy, the impact on speed can be be very serious.
If lengthy right recursions are being expanded, this will be evident from the parse report itself, which will contain one report item for every completion in the right-recursive chain of completions. Note that the efficiency consideration just mentioned for following right recursions is never an issue for left recursions. Left recursions only produce at most two report items per G1 location and are extremely fast to process. It is also not an issue for Marpa's sequence rules, because sequence rules are implemented internally as left recursions.
Appendix
Below are the code, the trace outputs and the progress report for the example used in this document.
Code
my $slif_debug_source = <<'END_OF_SOURCE';
:default ::= action => ::array bless => ::lhs
:start ::= statements
statements ::= statement *
statement ::= assignment | <numeric assignment>
assignment ::= 'set' variable 'to' expression
# This is a deliberate error in the grammar
# The next line should be:
# <numeric assignment> ::= variable '=' <numeric expression>
# I have changed the <numeric expression> to <expression> which
# will cause problems.
<numeric assignment> ::= variable '=' expression
expression ::=
variable | string
|| 'string' '(' <numeric expression> ')'
|| expression '+' expression
<numeric expression> ::=
variable | number
|| <numeric expression> '+' <numeric expression>
|| <numeric expression> '*' <numeric expression>
variable ~ [\w]+
number ~ [\d]+
string ~ ['] <string contents> [']
<string contents> ~ [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]+
:discard ~ whitespace
whitespace ~ [\s]+
END_OF_SOURCE
my $grammar = Marpa::R2::Scanless::G->new(
{
bless_package => 'My_Nodes',
source => \$slif_debug_source,
});
my $recce = Marpa::R2::Scanless::R->new(
{ grammar => $grammar,
trace_terminals => 1,
trace_values => 1,
} );
my $test_input = 'a = 8675309 + 42 * 711' ;
my $eval_error = $EVAL_ERROR if not eval { $recce->read( \$test_input ); 1 };
$progress_report = $recce->show_progress( 0, -1 );
Error message
Error in SLIF parse: No lexemes accepted at line 1, column 18
Rejected lexeme #0: '*'; value="*"; length = 1
* String before error: a = 8675309 + 42\s
* The error was at line 1, column 18, and at character 0x002a '*', ...
* here: * 711
Parse value at error location
Note that when there is a parse error, there will not always be a parse value. But sometimes the parse is "successful" enough, in a technical sense, to produce a value, and in those cases examining the value can be helpful in determining what the parser thinks it has seen so far.
my $value_ref = $recce->value();
my $expected_output = \bless( [
bless( [
bless( [
'a',
'=',
bless( [
bless( [
'8675309'
], 'My_Nodes::expression' ),
'+',
bless( [
'42'
], 'My_Nodes::expression' )
], 'My_Nodes::expression' )
], 'My_Nodes::numeric_assignment' )
], 'My_Nodes::statement' )
], 'My_Nodes::statements' );
Trace output
Setting trace_terminals option
Setting trace_values option
Accepted lexeme L1c1 e1: variable; value="a"
Discarded lexeme L1c2: whitespace
Accepted lexeme L1c3 e2: '='; value="="
Discarded lexeme L1c4: whitespace
Rejected lexeme L1c5-11: number; value="8675309"
Accepted lexeme L1c5-11 e3: variable; value="8675309"
Discarded lexeme L1c12: whitespace
Rejected lexeme L1c13: '+'; value="+"
Accepted lexeme L1c13 e4: '+'; value="+"
Discarded lexeme L1c14: whitespace
Rejected lexeme L1c15-16: number; value="42"
Accepted lexeme L1c15-16 e5: variable; value="42"
Discarded lexeme L1c17: whitespace
Rejected lexeme L1c18: '*'; value="*"
show_progress() output
P0 @0-0 L0c0 statements -> . statement *
P1 @0-0 L0c0 statement -> . assignment
P2 @0-0 L0c0 statement -> . <numeric assignment>
P3 @0-0 L0c0 assignment -> . 'set' variable 'to' expression
P4 @0-0 L0c0 <numeric assignment> -> . variable '=' expression
P19 @0-0 L0c0 :start -> . statements
R4:1 @0-1 L1c1 <numeric assignment> -> variable . '=' expression
R4:2 @0-2 L1c1-3 <numeric assignment> -> variable '=' . expression
P5 @2-2 L1c3 expression -> . expression
P6 @2-2 L1c3 expression -> . expression
P7 @2-2 L1c3 expression -> . expression
P8 @2-2 L1c3 expression -> . variable
P9 @2-2 L1c3 expression -> . string
P10 @2-2 L1c3 expression -> . 'string' '(' <numeric expression> ')'
P11 @2-2 L1c3 expression -> . expression '+' expression
F0 @0-3 L1c1-11 statements -> statement * .
P1 @3-3 L1c5-11 statement -> . assignment
P2 @3-3 L1c5-11 statement -> . <numeric assignment>
F2 @0-3 L1c1-11 statement -> <numeric assignment> .
P3 @3-3 L1c5-11 assignment -> . 'set' variable 'to' expression
P4 @3-3 L1c5-11 <numeric assignment> -> . variable '=' expression
F4 @0-3 L1c1-11 <numeric assignment> -> variable '=' expression .
F5 @2-3 L1c3-11 expression -> expression .
F6 @2-3 L1c3-11 expression -> expression .
F7 @2-3 L1c3-11 expression -> expression .
F8 @2-3 L1c3-11 expression -> variable .
R11:1 @2-3 L1c3-11 expression -> expression . '+' expression
F19 @0-3 L1c1-11 :start -> statements .
P7 @4-4 L1c13 expression -> . expression
P8 @4-4 L1c13 expression -> . variable
P9 @4-4 L1c13 expression -> . string
P10 @4-4 L1c13 expression -> . 'string' '(' <numeric expression> ')'
R11:2 @2-4 L1c3-13 expression -> expression '+' . expression
F0 @0-5 L1c1-16 statements -> statement * .
P1 @5-5 L1c15-16 statement -> . assignment
P2 @5-5 L1c15-16 statement -> . <numeric assignment>
F2 @0-5 L1c1-16 statement -> <numeric assignment> .
P3 @5-5 L1c15-16 assignment -> . 'set' variable 'to' expression
P4 @5-5 L1c15-16 <numeric assignment> -> . variable '=' expression
F4 @0-5 L1c1-16 <numeric assignment> -> variable '=' expression .
F5 @2-5 L1c3-16 expression -> expression .
F7 @4-5 L1c13-16 expression -> expression .
F8 @4-5 L1c13-16 expression -> variable .
R11:1 @2-5 L1c3-16 expression -> expression . '+' expression
F11 @2-5 L1c3-16 expression -> expression '+' expression .
F19 @0-5 L1c1-16 :start -> statements .
show_rules() output
This is the G1 portion of the show_rules()
output at verbosity level 3. In ordinary work, you'd use verbosity level 1 (the default), but the more verbose output is included here to illustrate the example.
G1 Rules:
G1 R0 statements ::= statement *
Symbol IDs: <16> ::= <17>
Internal symbols: <statements> ::= <statement>
G1 R1 statement ::= assignment
Symbol IDs: <17> ::= <18>
Internal symbols: <statement> ::= <assignment>
G1 R2 statement ::= <numeric assignment>
Symbol IDs: <17> ::= <19>
Internal symbols: <statement> ::= <numeric assignment>
G1 R3 assignment ::= 'set' variable 'to' expression
Symbol IDs: <18> ::= <1> <20> <2> <21>
Internal symbols: <assignment> ::= <[Lex-0]> <variable> <[Lex-1]> <expression>
G1 R4 <numeric assignment> ::= variable '=' <numeric expression>
Symbol IDs: <19> ::= <20> <3> <22>
Internal symbols: <numeric assignment> ::= <variable> <[Lex-2]> <numeric expression>
G1 R5 expression ::= expression
Internal rule top priority rule for <expression>
Symbol IDs: <21> ::= <10>
Internal symbols: <expression> ::= <expression[0]>
G1 R6 expression ::= expression
Internal rule for symbol <expression> priority transition from 0 to 1
Symbol IDs: <10> ::= <11>
Internal symbols: <expression[0]> ::= <expression[1]>
G1 R7 expression ::= expression
Internal rule for symbol <expression> priority transition from 1 to 2
Symbol IDs: <11> ::= <12>
Internal symbols: <expression[1]> ::= <expression[2]>
G1 R8 expression ::= variable
Symbol IDs: <12> ::= <20>
Internal symbols: <expression[2]> ::= <variable>
G1 R9 expression ::= string
Symbol IDs: <12> ::= <23>
Internal symbols: <expression[2]> ::= <string>
G1 R10 expression ::= 'string' '(' <numeric expression> ')'
Symbol IDs: <11> ::= <4> <5> <22> <6>
Internal symbols: <expression[1]> ::= <[Lex-3]> <[Lex-4]> <numeric expression> <[Lex-5]>
G1 R11 expression ::= expression '+' expression
Symbol IDs: <10> ::= <10> <7> <11>
Internal symbols: <expression[0]> ::= <expression[0]> <[Lex-6]> <expression[1]>
G1 R12 <numeric expression> ::= <numeric expression>
Internal rule top priority rule for <numeric expression>
Symbol IDs: <22> ::= <13>
Internal symbols: <numeric expression> ::= <numeric expression[0]>
G1 R13 <numeric expression> ::= <numeric expression>
Internal rule for symbol <numeric expression> priority transition from 0 to 1
Symbol IDs: <13> ::= <14>
Internal symbols: <numeric expression[0]> ::= <numeric expression[1]>
G1 R14 <numeric expression> ::= <numeric expression>
Internal rule for symbol <numeric expression> priority transition from 1 to 2
Symbol IDs: <14> ::= <15>
Internal symbols: <numeric expression[1]> ::= <numeric expression[2]>
G1 R15 <numeric expression> ::= variable
Symbol IDs: <15> ::= <20>
Internal symbols: <numeric expression[2]> ::= <variable>
G1 R16 <numeric expression> ::= number
Symbol IDs: <15> ::= <24>
Internal symbols: <numeric expression[2]> ::= <number>
G1 R17 <numeric expression> ::= <numeric expression> '+' <numeric expression>
Symbol IDs: <14> ::= <14> <8> <15>
Internal symbols: <numeric expression[1]> ::= <numeric expression[1]> <[Lex-7]> <numeric expression[2]>
G1 R18 <numeric expression> ::= <numeric expression> '*' <numeric expression>
Symbol IDs: <13> ::= <13> <9> <14>
Internal symbols: <numeric expression[0]> ::= <numeric expression[0]> <[Lex-8]> <numeric expression[1]>
G1 R19 :start ::= statements
Symbol IDs: <0> ::= <16>
Internal symbols: <[:start]> ::= <statements>
Lex (L0) Rules:
L0 R0 'set' ::= [s] [e] [t]
Internal rule for single-quoted string 'set'
Symbol IDs: <2> ::= <27> <21> <28>
Internal symbols: <[Lex-0]> ::= <[[s]]> <[[e]]> <[[t]]>
L0 R1 'to' ::= [t] [o]
Internal rule for single-quoted string 'to'
Symbol IDs: <3> ::= <28> <25>
Internal symbols: <[Lex-1]> ::= <[[t]]> <[[o]]>
L0 R2 '=' ::= [\=]
Internal rule for single-quoted string '='
Symbol IDs: <4> ::= <16>
Internal symbols: <[Lex-2]> ::= <[[\=]]>
L0 R3 'string' ::= [s] [t] [r] [i] [n] [g]
Internal rule for single-quoted string 'string'
Symbol IDs: <5> ::= <27> <28> <26> <23> <24> <22>
Internal symbols: <[Lex-3]> ::= <[[s]]> <[[t]]> <[[r]]> <[[i]]> <[[n]]> <[[g]]>
L0 R4 '(' ::= [\(]
Internal rule for single-quoted string '('
Symbol IDs: <6> ::= <12>
Internal symbols: <[Lex-4]> ::= <[[\(]]>
L0 R5 ')' ::= [\)]
Internal rule for single-quoted string ')'
Symbol IDs: <7> ::= <13>
Internal symbols: <[Lex-5]> ::= <[[\)]]>
L0 R6 '+' ::= [\+]
Internal rule for single-quoted string '+'
Symbol IDs: <8> ::= <15>
Internal symbols: <[Lex-6]> ::= <[[\+]]>
L0 R7 '+' ::= [\+]
Internal rule for single-quoted string '+'
Symbol IDs: <9> ::= <15>
Internal symbols: <[Lex-7]> ::= <[[\+]]>
L0 R8 '*' ::= [\*]
Internal rule for single-quoted string '*'
Symbol IDs: <10> ::= <14>
Internal symbols: <[Lex-8]> ::= <[[\*]]>
L0 R9 variable ::= [\w] +
Symbol IDs: <29> ::= <19>
Internal symbols: <variable> ::= <[[\w]]>
L0 R10 number ::= [\d] +
Symbol IDs: <30> ::= <17>
Internal symbols: <number> ::= <[[\d]]>
L0 R11 string ::= ['] <string contents> [']
Symbol IDs: <31> ::= <11> <32> <11>
Internal symbols: <string> ::= <[[']]> <string contents> <[[']]>
L0 R12 <string contents> ::= [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] +
Symbol IDs: <32> ::= <20>
Internal symbols: <string contents> ::= <[[^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]]>
L0 R13 :discard ::= whitespace
Discard rule for <whitespace>
Symbol IDs: <0> ::= <33>
Internal symbols: <[:discard]> ::= <whitespace>
L0 R14 whitespace ::= [\s] +
Symbol IDs: <33> ::= <18>
Internal symbols: <whitespace> ::= <[[\s]]>
L0 R15 :start_lex ::= :discard
Internal lexical start rule for <[:discard]>
Symbol IDs: <1> ::= <0>
Internal symbols: <[:start_lex]> ::= <[:discard]>
L0 R16 :start_lex ::= 'set'
Internal lexical start rule for <[Lex-0]>
Symbol IDs: <1> ::= <2>
Internal symbols: <[:start_lex]> ::= <[Lex-0]>
L0 R17 :start_lex ::= 'to'
Internal lexical start rule for <[Lex-1]>
Symbol IDs: <1> ::= <3>
Internal symbols: <[:start_lex]> ::= <[Lex-1]>
L0 R18 :start_lex ::= '='
Internal lexical start rule for <[Lex-2]>
Symbol IDs: <1> ::= <4>
Internal symbols: <[:start_lex]> ::= <[Lex-2]>
L0 R19 :start_lex ::= 'string'
Internal lexical start rule for <[Lex-3]>
Symbol IDs: <1> ::= <5>
Internal symbols: <[:start_lex]> ::= <[Lex-3]>
L0 R20 :start_lex ::= '('
Internal lexical start rule for <[Lex-4]>
Symbol IDs: <1> ::= <6>
Internal symbols: <[:start_lex]> ::= <[Lex-4]>
L0 R21 :start_lex ::= ')'
Internal lexical start rule for <[Lex-5]>
Symbol IDs: <1> ::= <7>
Internal symbols: <[:start_lex]> ::= <[Lex-5]>
L0 R22 :start_lex ::= '+'
Internal lexical start rule for <[Lex-6]>
Symbol IDs: <1> ::= <8>
Internal symbols: <[:start_lex]> ::= <[Lex-6]>
L0 R23 :start_lex ::= '+'
Internal lexical start rule for <[Lex-7]>
Symbol IDs: <1> ::= <9>
Internal symbols: <[:start_lex]> ::= <[Lex-7]>
L0 R24 :start_lex ::= '*'
Internal lexical start rule for <[Lex-8]>
Symbol IDs: <1> ::= <10>
Internal symbols: <[:start_lex]> ::= <[Lex-8]>
L0 R25 :start_lex ::= number
Internal lexical start rule for <number>
Symbol IDs: <1> ::= <30>
Internal symbols: <[:start_lex]> ::= <number>
L0 R26 :start_lex ::= string
Internal lexical start rule for <string>
Symbol IDs: <1> ::= <31>
Internal symbols: <[:start_lex]> ::= <string>
L0 R27 :start_lex ::= variable
Internal lexical start rule for <variable>
Symbol IDs: <1> ::= <29>
Internal symbols: <[:start_lex]> ::= <variable>
show_symbols() output
G1 Symbols:
G1 S0 :start -- Internal G1 start symbol
Internal name: <[:start]>
G1 S1 'set' -- Internal lexical symbol for "'set'"
/* terminal */
Internal name: <[Lex-0]>
SLIF name: 'set'
G1 S2 'to' -- Internal lexical symbol for "'to'"
/* terminal */
Internal name: <[Lex-1]>
SLIF name: 'to'
G1 S3 '=' -- Internal lexical symbol for "'='"
/* terminal */
Internal name: <[Lex-2]>
SLIF name: '='
G1 S4 'string' -- Internal lexical symbol for "'string'"
/* terminal */
Internal name: <[Lex-3]>
SLIF name: 'string'
G1 S5 '(' -- Internal lexical symbol for "'('"
/* terminal */
Internal name: <[Lex-4]>
SLIF name: '('
G1 S6 ')' -- Internal lexical symbol for "')'"
/* terminal */
Internal name: <[Lex-5]>
SLIF name: ')'
G1 S7 '+' -- Internal lexical symbol for "'+'"
/* terminal */
Internal name: <[Lex-6]>
SLIF name: '+'
G1 S8 '+' -- Internal lexical symbol for "'+'"
/* terminal */
Internal name: <[Lex-7]>
SLIF name: '+'
G1 S9 '*' -- Internal lexical symbol for "'*'"
/* terminal */
Internal name: <[Lex-8]>
SLIF name: '*'
G1 S10 expression -- <expression> at priority 0
Internal name: <expression[0]>
SLIF name: expression
G1 S11 expression -- <expression> at priority 1
Internal name: <expression[1]>
SLIF name: expression
G1 S12 expression -- <expression> at priority 2
Internal name: <expression[2]>
SLIF name: expression
G1 S13 <numeric expression> -- <numeric expression> at priority 0
Internal name: <numeric expression[0]>
SLIF name: numeric expression
G1 S14 <numeric expression> -- <numeric expression> at priority 1
Internal name: <numeric expression[1]>
SLIF name: numeric expression
G1 S15 <numeric expression> -- <numeric expression> at priority 2
Internal name: <numeric expression[2]>
SLIF name: numeric expression
G1 S16 statements
Internal name: <statements>
G1 S17 statement
Internal name: <statement>
G1 S18 assignment
Internal name: <assignment>
G1 S19 <numeric assignment>
Internal name: <numeric assignment>
G1 S20 variable
/* terminal */
Internal name: <variable>
G1 S21 expression
Internal name: <expression>
G1 S22 <numeric expression>
Internal name: <numeric expression>
G1 S23 string
/* terminal */
Internal name: <string>
G1 S24 number
/* terminal */
Internal name: <number>
progress() outputs
These section contains samples of the output of the progress()
method -- the progress reports in their "raw" format. The output is shown in Data::Dumper format, with Data::Dumper::Indent
set to 0 and Data::Dumper::Terse
set to 1.
The Data::Dumper
output from progress()
at G1 location 0:
[[0,0,0],[1,0,0],[2,0,0],[3,0,0],[4,0,0],[19,0,0]]
The Data::Dumper
output from progress()
at G1 location 1:
[[4,1,0]]
The Data::Dumper
output from progress()
at location 2:
[[5,0,2],[6,0,2],[7,0,2],[8,0,2],[9,0,2],[10,0,2],[11,0,2],[4,2,0]]
The default progress()
output is for the latest Earley set. Here is the progress()
output for the latest Earley set.
[[0,-1,0],[2,-1,0],[4,-1,0],[5,-1,2],[7,-1,4],[8,-1,4],[11,-1,2],[19,-1,0],[1,0,5],[2,0,5],[3,0,5],[4,0,5],[11,1,2]]
Copyright and License
Copyright 2018 Jeffrey Kegler
This file is part of Marpa::R2. Marpa::R2 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
Marpa::R2 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser
General Public License along with Marpa::R2. If not, see
http://www.gnu.org/licenses/.