NAME
Marpa::R3::Progress - Progress reports on your parse
About this document
This document describes the progress reports for Marpa::R3. These allow an application to know exactly where it is in the parse at any point. For parse locations of the user's choosing, progress reports list all the productions in play, and indicate the location at which the production started, and how far into the production parsing has progressed.
Progress reports are extremely useful in debugging grammars and the detailed example in this document is a debugging situation. Readers specifically interested in debugging a grammar should read the document on tracing problems before reading this document.
Introduction to Earley items
To read the progress_show
output, it is important to have a basic idea of what Earley items are, and of what the information in them means. Everything that the user needs to know is explained in this section.
Dotted productions
Marpa is based on Jay Earley's algorithm for parsing. The idea behind Earley's algorithm is that you can parse by building a table of productions and where you are in those productions. "Where" means two things: location in the production relative to the production's symbols, and location relative to the parse's input stream.
Let's look at an example of a production in a context-free grammar. Here's the production for assignment from the Perl distribution's perly.y
termbinop -> term ASSIGNOP term
ASSIGNOP
is perly.y
's internal name for the assignment operator. In plain Perl terms, this is the "=
" character.
In parsing this production, we can be at any of four possible locations. One location is at the beginning, before all of the symbols. The other three locations are immediately after each of the production's three symbols.
Within a production, position relative to the symbols of the production is traditionally indicated with a dot. In fact, the symbol-relative production position is very often called the dot location. Taken as a pair, a production and a dot location are called a dotted production.
Here's our rule with a dot location indicated:
termbinop -> · term ASSIGNOP term
The dot location in this dotted production is at the beginning. A dot location at the beginning of a dotted production means that we have not recognized any symbols in the rule yet. All we are doing is predicting that the production will occur. A dotted production with the dot before all of its symbols is called a prediction a predicted production, or a predicted rule, .
Here's another dotted production:
termbinop -> term · ASSIGNOP term
In this dotted production, we are saying we have seen a term
, but have not yet recognized an ASSIGNOP
.
There's another special kind of dotted production, a completion. A completion (also called a completed rule or a completed production ) is a dotted production with the dot after all of the symbols. Here is the completion for the production that we have been using as an example:
termbinop -> term ASSIGNOP term ·
A completion indicates that a production has been fully recognized.
Earley items
The dotted productions contain all but one piece of the information that Marpa needs to track. The missing piece is the second of the two "wheres": where in the input stream. To associate input stream location and dotted productions, Marpa uses what are now called Earley items.
A convenient way to think of an Earley item is as a triple, or 3-tuple, consisting of dotted production, origin and current location. The origin is the location in the input stream where the dotted production starts. The current location (also called the dot location) is the location in the input stream which corresponds to the dot position.
Marpa actually has two different ideas of input stream location: G1 location and L0 location. G1 location is location in terms of the G1 subgrammar's Earley sets. L0 location is location in terms of physical input, which is also location in terms of the L0 subgrammar's Earley sets. When the term "location" is used in this document, it means L0 location unless otherwise indicated.
L0 location is often reported in terms of input block, line and column. Marpa allows applications to have multiple input strings (called in this context "blocks") but most applications will use only input block, called B1
.
Two noteworthy consequences follow from the way in which origin and current location are defined.
If a dotted production is a prediction, then origin and current location will always be the same.
The input stream location where a production ends is not tracked unless the dotted production is a completion. In other cases, an Earley item does not tell us if a production will ever be completed, much less at which location.
The problem
For this example of debugging, I have taken a very simple prototype of a string expression calculator and deliberately introduced a problem. I've commented out one of the correct rules:
# <numeric assignment> ::= variable '=' <numeric expression>
and replaced it with a altered one:
<numeric assignment> ::= variable '=' expression
For those readers who like to look ahead (and I encourage you to be one of those readers) all of the code and outputs for this example are collected in the "Appendix".
This altered rule contains an mistake of the kind that is easy to make in actual practice. (In this case, a unlucky choice of naming conventions may have contributed.) The altered version will cause problems. In what follows, we'll pretend we don't already know where the problem is, and that in desk-checking the grammar our eye does not spot the mistake, so that we need to use the Marpa diagnostics and tracing facilities to "discover" it.
The example
The example we will use is a prototype string calculator. It's extremely simple, to make the example easy to follow. But it can be seen as a realistic example, if it is thought of as a very early stage in the incremental development of something useful.
:default ::= action => ::array bless => ::lhs
:start ::= statements
statements ::= statement *
statement ::= assignment | <numeric assignment>
assignment ::= 'set' variable 'to' expression
# This is a deliberate error in the grammar
# The next line should be:
# <numeric assignment> ::= variable '=' <numeric expression>
# I have changed the <numeric expression> to <expression> which
# will cause problems.
<numeric assignment> ::= variable '=' expression
expression ::=
variable | string
|| 'string' '(' <numeric expression> ')'
|| expression '+' expression
<numeric expression> ::=
variable | number
|| <numeric expression> '*' <numeric expression>
|| <numeric expression> '+' <numeric expression>
variable ~ [\w]+
number ~ [\d]+
string ~ ['] <string contents> [']
<string contents> ~ [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]+
:discard ~ whitespace
whitespace ~ [\s]+
At this stage of developing our string calculator, we have assignment, variables, constants, concatenation and conversion of numerics. For numerics, we have assignment, variables, constants, multiplication and addition.
We decide that, since string expressions and variables are the "default", that in the grammar we'll make the symbol names for numeric assignment and expressions explicit: <numeric expression>
and <numeric assignment>
. But since strings are the default, we decide to call our string expressions simply <expression>
, and to call our string assignments simply <assignment>
. This seems like a good idea, but it is also likely to cause confusion. For the sake of our example we will pretend that it did.
The error message
If we try the following input,
my $test_input = 'a = 8675309 + 42 * 711';
we will get this error message,
Error in parse: No lexeme found at B1L1c18
* String before error: a = 8675309 + 42\s
* The error was at B1L1c18, and at character U+002a "*", ...
* here: * 711
The error message indicates that Marpa rejected the "*
" operator.
The value of the parse
In debugging this issue, we'll look at the value of the parse first. The parse value differs from the other debugging aids we'll discuss. Every other debugging tool we will describe is always available, no matter how badly the parse failed. But if you have a problem parsing, you often won't get a parse value.
Our luck holds. Here's a dump of the parse value at the point of failure. It's a nice to way to see what Marpa thinks the parse was so far.
\bless( [
bless( [
bless( [
'a',
'=',
bless( [
bless( [
'8675309'
], 'My_Nodes::expression' ),
'+',
bless( [
'42'
], 'My_Nodes::expression' )
], 'My_Nodes::expression' )
], 'My_Nodes::numeric_assignment' )
], 'My_Nodes::statement' )
], 'My_Nodes::statements' );
If we were perceptive, we might spot the error here. Our parse is not quite right, and that shows up in the outer My_Nodes::expression
-- it should be My_Nodes::numeric_expression
. We'll assume that we don't notice this.
In fact, in the following, we'll pretend we haven't seen the dump of the parse value. We can't always get a parse value, so we don't want to rely on it.
Output from trace_terminals()
You can rely on getting the output from trace_terminals
, and it is a good next place to check. Typically, you will be interested in the last tokens to be accepted. Sometimes that information alone is enough to make it clear where the problem is.
The full trace_terminals
output for this example is in the Appendix. We see that the recognizer accepts the input as far as the multiplication sign ("*
"), which it rejects. In Marpa, a lexeme is "acceptable" if it fits the grammar and the input so far. A lexeme is rejected if it is not acceptable.
The last few lines of the trace_terminals
output are:
Discarded lexeme B1L1c17: whitespace
Restarted recognizer at B1L1c18
Reading codepoint "*" 0x002a at B1L1c18
Codepoint "*" 0x002a rejected as [\*] at B1L1c18
Codepoint "*" 0x002a rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c18
A note in passing: Marpa shows the input string position of the tokens it accepts, discard and rejects. <whitespace>
is supposed to be discarded and that was what happened at line 1, column 17. But the '*'
that was next in the input was rejected, and that was not supposed to happen.
Output from progress_show()
Marpa's most powerful tool for debugging grammars is its progress report, which shows the Earley items being worked on. In the Appendix, progress reports for the entire parse are shown. Our example in this document is a very small one, so that producing progress reports for the entire parse is a reasonable thing to do in this case. If a parse is at all large, you will usually need to be selective.
The progress report that is usually of most interest is the one for the Earley set that you were working on when the error occurred. This is called the current location. In our example the current location is B1L1c17 (L0 location block 1 line 1 column 17), which is also G1 location 5. By default, progress_show
prints out only the progress reports for the current location.
Here are the progress reports for Earley set 5 (B1L1c17), from our example.
=== Earley set 5 at B1L1c17 ===
P2 B1L1c17 statement ::= . <numeric assignment>
P3 B1L1c17 assignment ::= . 'set' variable 'to' expression
P4 B1L1c17 <numeric assignment> ::= . variable '=' expression
P20 B1L1c17 statement ::= . assignment
R11:1 B1L1c5 expression ::= expression . '+' expression; prec=0
F1 B1L1c1 [:start:] ::= statements .
F2 B1L1c1 statement ::= <numeric assignment> .
F4 B1L1c1 <numeric assignment> ::= variable '=' expression .
F5 B1L1c5 expression ::= expression .; prec=-1
F7 B1L1c15 expression ::= expression .; prec=1
F8 B1L1c15 expression ::= variable .; prec=2
F11 B1L1c5 expression ::= expression '+' expression .; prec=0
F18 B1L1c1 statements ::= statement . *
Progress report lines
F1 B1L1c1 [:start:] ::= statements .
The first field (in this case "F1
") is a rule tag -- a "type code" followed by the rule number. In this case the "F1
" indicates that this is a completed or final rule, and that it is rule number 1. The rule number is a convenient and very brief way to refer to a rule.
After the rule tag will be one or two fields of location information. If there are multiple origins for the dotted production, there will be two fields of location information. In the example just above there is only one field of location information: "B1L1c1
". This indicates that the dotted production has only one origin, and that that origin is at line 1, column 1 of block 1. Examples of report lines with multiple origins will be given below.
The last field of each progress report line shows, in fully expanded form, the dotted production we were working on. In the line just above, the fully expanded dotted production is "[:start:] ::= statements .
".
Notice that the left hand side symbol is [:start]
. That is the start pseudo-symbol. The presence of a completed start rule in our progress report indicates that if our input had ended at Earley set 5, it would be a valid sentence in the language of our grammar. (And lucky for us -- because the input at G1 location 5 was a valid sentence of the grammar, we are able to look at the value of the parse at location 5 for debugging purposes.)
Let's look at another progress report line:
R11:2 B1L1c5 expression ::= expression '+' . expression; prec=0
Here the "R11:2
" indicates that this is rule number 11 (the "R
" stands for rule number) and that its dot position is after the second symbol on the right hand side. Symbol positions are numbered using the ordinal of the symbol just before the position. Symbols are numbered starting with 1, and symbol position 2 is the position immediately after symbol 2.
Note the "prec=0
" at the end of the above line. The production expression ::= expression '+' . expression
is one of the productions in a precedenced rule, and "prec=0
" indicates that its precedence is 0.
Predicted rules also appear in progress reports:
P2 B1L1c13 statement ::= . <numeric assignment>
Here the "P
" in the summary field means "predicted".
OK! Now to find the bug
If we look again at the progress reports for Earley set 5, the G1 location where things went wrong, we see that we have completed rules for <expression>
, for <numeric assignment>
, for <statement>
, and for <statements>
, as expected. We also see two Earley items that show that we are in the process of building another <expression>
, and that it is expecting a '+
' symbol.
Why is "*
" not expected?
Why is the recognizer not expecting an '*
' symbol? Looking back at the grammar, we see that only one production uses the '*
' symbol. That production is part of a precedenced rule in the DSL. Here it is:
<numeric expression> ::=
variable | number
|| <numeric expression> '*' <numeric expression>
|| <numeric expression> '+' <numeric expression>
And here is that production as shown in the productions_show()
listing:
R17 <numeric expression> ::= <numeric expression> '*' <numeric expression>
What is happening with rule 17?
The next step is to look at the Earley items for rule 17. But there is a problem. We don't find any.
What is the earliest place R17 should be appearing? The answer is that there should be a prediction of R17 at location 0. So we look at the predictions at location 0.
=== Earley set 0 at B1L1c1 ===
P1 B1L1c1 [:start:] ::= . statements
P2 B1L1c1 statement ::= . <numeric assignment>
P3 B1L1c1 assignment ::= . 'set' variable 'to' expression
P4 B1L1c1 <numeric assignment> ::= . variable '=' expression
P18 B1L1c1 statements ::= . statement *
P20 B1L1c1 statement ::= . assignment
No R17 predicted at G1 location 0. Next we look through the entire progress report, at all G1 locations, to see if R17 is predicted anywhere. No R17. Not anywhere.
Where is <numeric expression>
expected?
The LHS of R17 is <numeric expression>
. We look in the progress report for dotted productions where <numeric expression>
is expected -- that is, dotted productions where <numeric expression>
is the post-dot symbol. There are none.
Next we look for places in the progress reports where <numeric expression>
occurs anywhere on the RHS, whether post-dot or not. In the progress reports, <numeric expression>
occurs in only two dotted production instances. Here they are:
P10 B1L1c5 expression ::= . 'string' '(' <numeric expression> ')'; prec=1
P10 B1L1c15 expression ::= . 'string' '(' <numeric expression> ')'; prec=1
In both cases these are predictions of a string operator, the operator we plan to use for converting numerics to strings. They are just predictions, predictions which go no further because there is no 'string
' operator in our input. That's fine, but why no other, more relevant, occurrences of <numeric expression>
?
Rules with <numeric expression>
on the RHS
We look back at the grammar. Aside for the rule for the 'string
' operator, <numeric expression>
occurs on a RHS in two places. One is in the precedenced rule which defines <numeric expression>
.
<numeric expression> ::=
variable | number
|| <numeric expression> '*' <numeric expression>
|| <numeric expression> '+' <numeric expression>
This rule will never put <numeric expression>
into the Earley items unless there is a <numeric expression>
already there. Nonetheless this rule is OK -- it has a job and it's doing it. This rule does not need fixing.
That leaves one rule to look at.
<numeric assignment> ::= variable '=' expression
This rule is one that should lead to the prediction of a new <numeric expression>
in our example.
Problem solved
And now we see our problem. This rule is never leading to the prediction of a new <numeric expression>
, because there is no <numeric expression>
on its RHS, or for that matter anywhere else in it. On the RHS, where we wrote <expression>
, we should have written <numeric expression>
. Change that and the problem is fixed.
Complications
We have finished our main example. This section discusses some aspects of debugging which did not arise in the example, and which might be unexpected.
Empty rules
When a symbol is nulled in your parse, progress_show
show only the nulled symbol. It does not show the symbols expansion into rules, or any of its nulled child symbols.
This reduces clutter, and seems to be what programmers expect intuitively -- so much so that the absence of the nulled rules and non-root nulled symbols is rarely noticed. Nonetheless, programmers working out the full details of parses with nulled sub-trees should kept this in mind.
Input string ranges
By default, Marpa moves forward continuously in the input string. But Marpa applications have the option to move around arbitrarily in the input. Those using the default behavior can ignore the considerations in the rest of this section.
For the others, Marpa allow apps to move around arbitrarily in its input. Apps can organize the input into blocks, which they can visit in any order or not at all. Within blocks apps are free to jump around arbitarily in the input.
In Marpa, it is always true that
The first location in an input string range is the first character in L0 location order. (L0 location order is intrablock location within block number.)
The last location in an input string range is the last character in L0 location order.
The following statements are always true for apps which use the default behavior. For apps which do not use the default behavior, they are not necessarily true.
The first character in an input string range is the first one traversed.
The last character in an input string range is the last one traversed.
Every character in an input string range is traversed.
Every character traversed is traversed in L0 location order.
Multiple instances of dotted productions
It does not happen in our main example for this document, but a dotted production can appear in the same Earley set more than once. In fact, this happens frequently. When it does happen, the lines in the progress report will look like these
F2 x2 B1L1c20-44 assignment ::= <divide assignment> .
F6 x12 B1L1c1-L2c36 assignment ::= <plain assignment> .
F11 x12 B1L1c1-L2c36 <plain assignment> ::= 'x' '=' expression .
F13 x20 B1L1c1-L2c36 expression ::= assignment .
All of these report lines are for a single Earley set. They report dotted productions from an indirect right recursion, one that recurses from a <plain assignment>
symbol to an <expression>
symbol, and then to an <assignment>
symbol, before completing the recursion by returning to a <plain assignment>
.
In each of the three lines, notice that a new field appears second. In these examples, the second field is variously "x2
", "x12
" or "x20
". These fields are counts, indicating the number of instances of that dotted production in its Earley set -- in this case, that there are, respectively, 2, 12 and 20 instances of the reported dotted production.
Every dotted production instance is in the same Earley set, that is, has the same G1 location. Within an Earley set, an instance of the same dotted production differs for another instance of the same dotted production only in its origin.
The origin of a dotted production instance is the location where it begins. Within an Earley set the instances of a single dotted production may have many different origins -- hundreds or even more.
Each instance of a dotted production will have its own origin in the input string. The input string range will include them all.
Access to the "raw" progress report information
This section deals with the progress()
recognizer method, which allows access to the raw progress report information. This method is not needed for typical debugging and tracing situations. It is intended for applications which want to leverage Marpa's "situational awareness" in innovative ways.
progress()
my $report0 = $recce->progress(0);
my $latest_report = $recce->progress();
Given the G1 location (Earley set ID) as its argument, the progress()
recognizer method returns a reference to an array of "report items". The G1 location may be given as a negative number. An argument of -X will be interpreted as G1 location N-(X+1), where N is the latest Earley set. This means that an argument of -1 indicates the latest Earley set, an argument of -2 indicates the Earley set just before the latest one, etc.
Each report item is a triple: an array of three elements. The three elements are, in order, production ID, dot position, and origin. The data returned by the two displays above, as well as the data for the other G1 locations in our example, are shown below.
The production ID is the same number that Marpa uses to identify productions in tracing and debugging output. Given a production ID, an application can expand it into its LHS and RHS symbols using the grammar's production_expand()
method. Given a symbol ID, its name and other information can be found using other grammar methods.
Dot position is N, where N is the number of RHS symbols successfully recognized at the G1 location of the progress report. Dot position is 0 for predictions.
Origin is the G1 location (Earley set ID) at which the dotted production instance reported by the report item began. For a prediction, origin will always be the same as the G1 location of the parse report.
Progress reports and efficiency
When progress reports are used for production parsing, instead of just for debugging and tracing, efficiency considerations become significant. Progress reports themselves are implemented in optimized C, and that logic is very fast. However, the use of progress reports usually implies considerable post-processing in Perl. It is almost always possible to use Marpa's named events instead of progress reports, and solutions using named events are usually better targeted, simpler and faster.
If you do decide to use progress reports in an application, you should be aware of the efficiency considerations when there are right recursions in the grammar. For most purposes, Marpa optimizes right recursions, so that they run in linear time. However, to create a progress report every potential right recursion must be fully unfolded, and at each G1 location the number of these grows linearly with the length of the recursion. If you are creating progress reports for more than a limited number of G1 locations, this means processing that can become quadratic in the length of the recursion. When a right recursion is lengthy, the cost in processing time can be serious.
If lengthy right recursions are being expanded, this will be evident from the parse report itself, which will contain one report item for every completion in the right-recursive chain of completions. Note that the efficiency consideration just mentioned for expanding right recursions is never an issue for left recursions. Left recursions only produce at most two report items per G1 location and are extremely fast to process. It is also not an issue for Marpa's sequence rules, because sequence rules are implemented internally as left recursions.
Appendix
Below are the code, the trace outputs and the progress report for the example used in this document.
Code
my $slif_debug_source = <<'END_OF_SOURCE';
:default ::= action => ::array bless => ::lhs
:start ::= statements
statements ::= statement *
statement ::= assignment | <numeric assignment>
assignment ::= 'set' variable 'to' expression
# This is a deliberate error in the grammar
# The next line should be:
# <numeric assignment> ::= variable '=' <numeric expression>
# I have changed the <numeric expression> to <expression> which
# will cause problems.
<numeric assignment> ::= variable '=' expression
expression ::=
variable | string
|| 'string' '(' <numeric expression> ')'
|| expression '+' expression
<numeric expression> ::=
variable | number
|| <numeric expression> '*' <numeric expression>
|| <numeric expression> '+' <numeric expression>
variable ~ [\w]+
number ~ [\d]+
string ~ ['] <string contents> [']
<string contents> ~ [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]+
:discard ~ whitespace
whitespace ~ [\s]+
END_OF_SOURCE
my $grammar = Marpa::R3::Grammar->new(
{
bless_package => 'My_Nodes',
source => \$slif_debug_source,
});
my $recce = Marpa::R3::Recognizer->new(
{ grammar => $grammar,
trace_terminals => 1,
trace_values => 1,
} );
my $test_input = 'a = 8675309 + 42 * 711' ;
my $eval_error = $EVAL_ERROR if not eval { $recce->read( \$test_input ); 1 };
$progress_report = $recce->progress_show( 0, -1 );
Error message
Error in parse: No lexeme found at B1L1c18
* String before error: a = 8675309 + 42\s
* The error was at B1L1c18, and at character U+002a "*", ...
* here: * 711
Parse value at error location
Note that when there is a parse error, there will not always be a parse value. But sometimes the parse is "successful" enough, in a technical sense, to produce a value, and in those cases examining the value can be helpful in determining what the parser thinks it has seen so far.
my $value_ref = $recce->value();
my $expected_output = \bless( [
bless( [
bless( [
'a',
'=',
bless( [
bless( [
'8675309'
], 'My_Nodes::expression' ),
'+',
bless( [
'42'
], 'My_Nodes::expression' )
], 'My_Nodes::expression' )
], 'My_Nodes::numeric_assignment' )
], 'My_Nodes::statement' )
], 'My_Nodes::statements' );
Trace output
Setting trace_terminals option
Setting trace_values option to 1
Restarted recognizer at B1L1c1
Reading codepoint "a" 0x0061 at B1L1c1
Codepoint "a" 0x0061 accepted as [\w] at B1L1c1
Codepoint "a" 0x0061 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c1
Reading codepoint " " 0x0020 at B1L1c2
Codepoint " " 0x0020 rejected as [\s] at B1L1c2
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c2
Accepted lexeme B1L1c1 e1: variable; value="a"
Restarted recognizer at B1L1c2
Reading codepoint " " 0x0020 at B1L1c2
Codepoint " " 0x0020 accepted as [\s] at B1L1c2
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c2
Reading codepoint "=" 0x003d at B1L1c3
Codepoint "=" 0x003d rejected as [\=] at B1L1c3
Codepoint "=" 0x003d rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c3
Discarded lexeme B1L1c2: whitespace
Restarted recognizer at B1L1c3
Reading codepoint "=" 0x003d at B1L1c3
Codepoint "=" 0x003d accepted as [\=] at B1L1c3
Codepoint "=" 0x003d rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c3
Accepted lexeme B1L1c3 e2: '='; value="="
Restarted recognizer at B1L1c4
Reading codepoint " " 0x0020 at B1L1c4
Codepoint " " 0x0020 accepted as [\s] at B1L1c4
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c4
Reading codepoint "8" 0x0038 at B1L1c5
Codepoint "8" 0x0038 rejected as [\d] at B1L1c5
Codepoint "8" 0x0038 rejected as [\w] at B1L1c5
Codepoint "8" 0x0038 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c5
Discarded lexeme B1L1c4: whitespace
Restarted recognizer at B1L1c5
Reading codepoint "8" 0x0038 at B1L1c5
Codepoint "8" 0x0038 rejected as [\d] at B1L1c5
Codepoint "8" 0x0038 accepted as [\w] at B1L1c5
Codepoint "8" 0x0038 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c5
Reading codepoint "6" 0x0036 at B1L1c6
Codepoint "6" 0x0036 rejected as [\d] at B1L1c6
Codepoint "6" 0x0036 accepted as [\w] at B1L1c6
Codepoint "6" 0x0036 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c6
Reading codepoint "7" 0x0037 at B1L1c7
Codepoint "7" 0x0037 rejected as [\d] at B1L1c7
Codepoint "7" 0x0037 accepted as [\w] at B1L1c7
Codepoint "7" 0x0037 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c7
Reading codepoint "5" 0x0035 at B1L1c8
Codepoint "5" 0x0035 rejected as [\d] at B1L1c8
Codepoint "5" 0x0035 accepted as [\w] at B1L1c8
Codepoint "5" 0x0035 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c8
Reading codepoint "3" 0x0033 at B1L1c9
Codepoint "3" 0x0033 rejected as [\d] at B1L1c9
Codepoint "3" 0x0033 accepted as [\w] at B1L1c9
Codepoint "3" 0x0033 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c9
Reading codepoint "0" 0x0030 at B1L1c10
Codepoint "0" 0x0030 rejected as [\d] at B1L1c10
Codepoint "0" 0x0030 accepted as [\w] at B1L1c10
Codepoint "0" 0x0030 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c10
Reading codepoint "9" 0x0039 at B1L1c11
Codepoint "9" 0x0039 rejected as [\d] at B1L1c11
Codepoint "9" 0x0039 accepted as [\w] at B1L1c11
Codepoint "9" 0x0039 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c11
Reading codepoint " " 0x0020 at B1L1c12
Codepoint " " 0x0020 rejected as [\s] at B1L1c12
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c12
Accepted lexeme B1L1c5-11 e3: variable; value="8675309"
Restarted recognizer at B1L1c12
Reading codepoint " " 0x0020 at B1L1c12
Codepoint " " 0x0020 accepted as [\s] at B1L1c12
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c12
Reading codepoint "+" 0x002b at B1L1c13
Codepoint "+" 0x002b rejected as [\+] at B1L1c13
Codepoint "+" 0x002b rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c13
Discarded lexeme B1L1c12: whitespace
Restarted recognizer at B1L1c13
Reading codepoint "+" 0x002b at B1L1c13
Codepoint "+" 0x002b accepted as [\+] at B1L1c13
Codepoint "+" 0x002b rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c13
Accepted lexeme B1L1c13 e4: '+'; value="+"
Restarted recognizer at B1L1c14
Reading codepoint " " 0x0020 at B1L1c14
Codepoint " " 0x0020 accepted as [\s] at B1L1c14
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c14
Reading codepoint "4" 0x0034 at B1L1c15
Codepoint "4" 0x0034 rejected as [\d] at B1L1c15
Codepoint "4" 0x0034 rejected as [\w] at B1L1c15
Codepoint "4" 0x0034 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c15
Discarded lexeme B1L1c14: whitespace
Restarted recognizer at B1L1c15
Reading codepoint "4" 0x0034 at B1L1c15
Codepoint "4" 0x0034 rejected as [\d] at B1L1c15
Codepoint "4" 0x0034 accepted as [\w] at B1L1c15
Codepoint "4" 0x0034 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c15
Reading codepoint "2" 0x0032 at B1L1c16
Codepoint "2" 0x0032 rejected as [\d] at B1L1c16
Codepoint "2" 0x0032 accepted as [\w] at B1L1c16
Codepoint "2" 0x0032 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c16
Reading codepoint " " 0x0020 at B1L1c17
Codepoint " " 0x0020 rejected as [\s] at B1L1c17
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c17
Accepted lexeme B1L1c15-16 e5: variable; value="42"
Restarted recognizer at B1L1c17
Reading codepoint " " 0x0020 at B1L1c17
Codepoint " " 0x0020 accepted as [\s] at B1L1c17
Codepoint " " 0x0020 rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c17
Reading codepoint "*" 0x002a at B1L1c18
Codepoint "*" 0x002a rejected as [\*] at B1L1c18
Codepoint "*" 0x002a rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c18
Discarded lexeme B1L1c17: whitespace
Restarted recognizer at B1L1c18
Reading codepoint "*" 0x002a at B1L1c18
Codepoint "*" 0x002a rejected as [\*] at B1L1c18
Codepoint "*" 0x002a rejected as [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] at B1L1c18
valuator trace level: 1
progress_show() output
=== Earley set 0 at B1L1c1 ===
P1 B1L1c1 [:start:] ::= . statements
P2 B1L1c1 statement ::= . <numeric assignment>
P3 B1L1c1 assignment ::= . 'set' variable 'to' expression
P4 B1L1c1 <numeric assignment> ::= . variable '=' expression
P18 B1L1c1 statements ::= . statement *
P20 B1L1c1 statement ::= . assignment
=== Earley set 1 at B1L1c3 ===
R4:1 B1L1c1 <numeric assignment> ::= variable . '=' expression
=== Earley set 2 at B1L1c5 ===
P5 B1L1c5 expression ::= . expression; prec=-1
P6 B1L1c5 expression ::= . expression; prec=0
P7 B1L1c5 expression ::= . expression; prec=1
P8 B1L1c5 expression ::= . variable; prec=2
P9 B1L1c5 expression ::= . string; prec=2
P10 B1L1c5 expression ::= . 'string' '(' <numeric expression> ')'; prec=1
P11 B1L1c5 expression ::= . expression '+' expression; prec=0
R4:2 B1L1c1 <numeric assignment> ::= variable '=' . expression
=== Earley set 3 at B1L1c13 ===
P2 B1L1c13 statement ::= . <numeric assignment>
P3 B1L1c13 assignment ::= . 'set' variable 'to' expression
P4 B1L1c13 <numeric assignment> ::= . variable '=' expression
P20 B1L1c13 statement ::= . assignment
R11:1 B1L1c5 expression ::= expression . '+' expression; prec=0
F1 B1L1c1 [:start:] ::= statements .
F2 B1L1c1 statement ::= <numeric assignment> .
F4 B1L1c1 <numeric assignment> ::= variable '=' expression .
F5 B1L1c5 expression ::= expression .; prec=-1
F6 B1L1c5 expression ::= expression .; prec=0
F7 B1L1c5 expression ::= expression .; prec=1
F8 B1L1c5 expression ::= variable .; prec=2
F18 B1L1c1 statements ::= statement . *
=== Earley set 4 at B1L1c15 ===
P7 B1L1c15 expression ::= . expression; prec=1
P8 B1L1c15 expression ::= . variable; prec=2
P9 B1L1c15 expression ::= . string; prec=2
P10 B1L1c15 expression ::= . 'string' '(' <numeric expression> ')'; prec=1
R11:2 B1L1c5 expression ::= expression '+' . expression; prec=0
=== Earley set 5 at B1L1c17 ===
P2 B1L1c17 statement ::= . <numeric assignment>
P3 B1L1c17 assignment ::= . 'set' variable 'to' expression
P4 B1L1c17 <numeric assignment> ::= . variable '=' expression
P20 B1L1c17 statement ::= . assignment
R11:1 B1L1c5 expression ::= expression . '+' expression; prec=0
F1 B1L1c1 [:start:] ::= statements .
F2 B1L1c1 statement ::= <numeric assignment> .
F4 B1L1c1 <numeric assignment> ::= variable '=' expression .
F5 B1L1c5 expression ::= expression .; prec=-1
F7 B1L1c15 expression ::= expression .; prec=1
F8 B1L1c15 expression ::= variable .; prec=2
F11 B1L1c5 expression ::= expression '+' expression .; prec=0
F18 B1L1c1 statements ::= statement . *
productions_show() output
This is the productions_show()
output at verbosity level 3. Usually you would use verbosity level 1 (the default), particularly at first. But the more verbose output is included here for illustration.
R1 [:start:] ::= statements
Symbol IDs: <3> ::=
Canonical names: [:start:] ::=
R2 statement ::= <numeric assignment>
Symbol IDs: <36> ::=
Canonical names: statement ::=
R3 assignment ::= 'set' variable 'to' expression
Symbol IDs: <31> ::= <5> <40> <6>
Canonical names: assignment ::= [Lex-0] variable [Lex-1]
R4 <numeric assignment> ::= variable '=' <numeric expression>
Symbol IDs: <34> ::= <40> <7>
Canonical names: <numeric assignment> ::= variable [Lex-2]
R5 expression ::= expression; prec=-1
Symbol IDs: <32> ::=
Canonical names: expression ::=
R6 expression ::= expression; prec=0
Symbol IDs: <32> ::=
Canonical names: expression ::=
R7 expression ::= expression; prec=1
Symbol IDs: <32> ::=
Canonical names: expression ::=
R8 expression ::= variable; prec=2
Symbol IDs: <32> ::=
Canonical names: expression ::=
R9 expression ::= string; prec=2
Symbol IDs: <32> ::=
Canonical names: expression ::=
R10 expression ::= 'string' '(' <numeric expression> ')'; prec=1
Symbol IDs: <32> ::= <8> <9> <35>
Canonical names: expression ::= [Lex-3] [Lex-4] <numeric expression>
R11 expression ::= expression '+' expression; prec=0
Symbol IDs: <32> ::= <32> <11>
Canonical names: expression ::= expression [Lex-6]
R12 <numeric expression> ::= <numeric expression>; prec=-1
Symbol IDs: <35> ::=
Canonical names: <numeric expression> ::=
R13 <numeric expression> ::= <numeric expression>; prec=0
Symbol IDs: <35> ::=
Canonical names: <numeric expression> ::=
R14 <numeric expression> ::= <numeric expression>; prec=1
Symbol IDs: <35> ::=
Canonical names: <numeric expression> ::=
R15 <numeric expression> ::= variable; prec=2
Symbol IDs: <35> ::=
Canonical names: <numeric expression> ::=
R16 <numeric expression> ::= number; prec=2
Symbol IDs: <35> ::=
Canonical names: <numeric expression> ::=
R17 <numeric expression> ::= <numeric expression> '*' <numeric expression>; prec=1
Symbol IDs: <35> ::= <35> <12>
Canonical names: <numeric expression> ::= <numeric expression> [Lex-7]
R18 statements ::= statement *
Symbol IDs: <37> ::=
Canonical names: statements ::=
R19 <numeric expression> ::= <numeric expression> '+' <numeric expression>; prec=0
Symbol IDs: <35> ::= <35> <11>
Canonical names: <numeric expression> ::= <numeric expression> [Lex-6]
R20 statement ::= assignment
Symbol IDs: <36> ::=
Canonical names: statement ::=
R21 [:lex_start:] ~ [:target:]
Symbol IDs: <2> ::=
Canonical names: [:lex_start:] ::=
R22 [:target:] ~ [:discard:]
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R23 [:target:] ~ 'set'
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R24 [:target:] ~ 'to'
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R25 [:target:] ~ '='
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R26 [:target:] ~ 'string'
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R27 [:target:] ~ '('
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R28 [:target:] ~ ')'
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R29 [:target:] ~ '+'
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R30 [:target:] ~ '*'
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R31 [:target:] ~ number
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R32 [:target:] ~ string
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R33 [:target:] ~ variable
Symbol IDs: <4> ::=
Canonical names: [:target:] ::=
R34 'set' ~ [s] [e] [t]
Symbol IDs: <5> ::= <29> <23>
Canonical names: [Lex-0] ::= [[s]] [[e]]
R35 'to' ~ [t] [o]
Symbol IDs: <6> ::= <30>
Canonical names: [Lex-1] ::= [[t]]
R36 '=' ~ [\=]
Symbol IDs: <7> ::=
Canonical names: [Lex-2] ::=
R37 'string' ~ [s] [t] [r] [i] [n] [g]
Symbol IDs: <8> ::= <29> <30> <28> <25> <26>
Canonical names: [Lex-3] ::= [[s]] [[t]] [[r]] [[i]] [[n]]
R38 '(' ~ [\(]
Symbol IDs: <9> ::=
Canonical names: [Lex-4] ::=
R39 ')' ~ [\)]
Symbol IDs: <10> ::=
Canonical names: [Lex-5] ::=
R40 '+' ~ [\+]
Symbol IDs: <11> ::=
Canonical names: [Lex-6] ::=
R41 '*' ~ [\*]
Symbol IDs: <12> ::=
Canonical names: [Lex-7] ::=
R42 variable ~ [\w] +
Symbol IDs: <40> ::=
Canonical names: variable ::=
R43 number ~ [\d] +
Symbol IDs: <33> ::=
Canonical names: number ::=
R44 string ~ ['] <string contents> [']
Symbol IDs: <38> ::= <13> <39>
Canonical names: string ::= [[']] <string contents>
R45 <string contents> ~ [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}] +
Symbol IDs: <39> ::=
Canonical names: <string contents> ::=
R46 [:discard:] ~ whitespace
Symbol IDs: <1> ::=
Canonical names: [:discard:] ::=
R47 whitespace ~ [\s] +
Symbol IDs: <41> ::=
Canonical names: whitespace ::=
symbols_show() output
S1 [:discard:]
Canonical name: [:discard:]
DSL name: [:discard:]
S2 [:lex_start:]
Canonical name: [:lex_start:]
DSL name: [:lex_start:]
S3 [:start:]
Canonical name: [:start:]
DSL name: [:start:]
S4 [:target:]
Canonical name: [:target:]
DSL name: [:target:]
S5 'set'
Canonical name: [Lex-0]
DSL name: 'set'
S6 'to'
Canonical name: [Lex-1]
DSL name: 'to'
S7 '='
Canonical name: [Lex-2]
DSL name: '='
S8 'string'
Canonical name: [Lex-3]
DSL name: 'string'
S9 '('
Canonical name: [Lex-4]
DSL name: '('
S10 ')'
Canonical name: [Lex-5]
DSL name: ')'
S11 '+'
Canonical name: [Lex-6]
DSL name: '+'
S12 '*'
Canonical name: [Lex-7]
DSL name: '*'
S13 [']
Canonical name: [[']]
DSL name: [']
S14 [\(]
Canonical name: [[\(]]
DSL name: [\(]
S15 [\)]
Canonical name: [[\)]]
DSL name: [\)]
S16 [\*]
Canonical name: [[\*]]
DSL name: [\*]
S17 [\+]
Canonical name: [[\+]]
DSL name: [\+]
S18 [\=]
Canonical name: [[\=]]
DSL name: [\=]
S19 [\d]
Canonical name: [[\d]]
DSL name: [\d]
S20 [\s]
Canonical name: [[\s]]
DSL name: [\s]
S21 [\w]
Canonical name: [[\w]]
DSL name: [\w]
S22 [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]
Canonical name: [[^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]]
DSL name: [^'\x{0A}\x{0B}\x{0C}\x{0D}\x{0085}\x{2028}\x{2029}]
S23 [e]
Canonical name: [[e]]
DSL name: [e]
S24 [g]
Canonical name: [[g]]
DSL name: [g]
S25 [i]
Canonical name: [[i]]
DSL name: [i]
S26 [n]
Canonical name: [[n]]
DSL name: [n]
S27 [o]
Canonical name: [[o]]
DSL name: [o]
S28 [r]
Canonical name: [[r]]
DSL name: [r]
S29 [s]
Canonical name: [[s]]
DSL name: [s]
S30 [t]
Canonical name: [[t]]
DSL name: [t]
S31 assignment
Canonical name: assignment
DSL name: assignment
S32 expression
Canonical name: expression
DSL name: expression
S33 number
Canonical name: number
DSL name: number
S34 <numeric assignment>
Canonical name: <numeric assignment>
DSL name: numeric assignment
S35 <numeric expression>
Canonical name: <numeric expression>
DSL name: numeric expression
S36 statement
Canonical name: statement
DSL name: statement
S37 statements
Canonical name: statements
DSL name: statements
S38 string
Canonical name: string
DSL name: string
S39 <string contents>
Canonical name: <string contents>
DSL name: string contents
S40 variable
Canonical name: variable
DSL name: variable
S41 whitespace
Canonical name: whitespace
DSL name: whitespace
progress() outputs
These section contains samples of the output of the progress()
method -- the progress reports in their "raw" format. The output is shown in Data::Dumper format, with Data::Dumper::Indent
set to 0 and Data::Dumper::Terse
set to 1.
The Data::Dumper
output from progress()
at G1 location 0:
[[1,0,0],[2,0,0],[3,0,0],[4,0,0],[18,0,0],[20,0,0]]
The Data::Dumper
output from progress()
at G1 location 1:
[[4,1,0]]
The Data::Dumper
output from progress()
at location 2:
[[4,2,0],[5,0,2],[6,0,2],[7,0,2],[8,0,2],[9,0,2],[10,0,2],[11,0,2]]
The default progress()
output is for the latest Earley set. Here is the progress()
output for the latest Earley set.
[[1,1,0],[2,0,5],[2,1,0],[3,0,5],[4,0,5],[4,3,0],[5,1,2],[7,1,4],[8,1,4],[11,1,2],[11,3,2],[18,1,0],[20,0,5]]
COPYRIGHT AND LICENSE
Marpa::R3 is Copyright (C) 2018, Jeffrey Kegler.
This module is free software; you can redistribute it and/or modify it
under the same terms as Perl 5.10.1. For more details, see the full text
of the licenses in the directory LICENSES.
This program is distributed in the hope that it will be
useful, but without any warranty; without even the implied
warranty of merchantability or fitness for a particular purpose.