NAME
Parse::Marpa::Doc::Diagnostics - Diagnostics
DESCRIPTION
This document describes techniques for debugging Marpa parses and grammars. It also lists those Marpa methods and Marpa options whose main use is in tracing and debugging.
Basic Debugging Techniques
If parsing failed before or at the end of input, first look at the place in the input where parsing was exhausted. That, along with inspection of the input and the grammar, is often enough to spot the problem. But, typically, you'll have already tried that before consulting this document. Next, you should make sure that Marpa's warnings
option is turned on. It's on by default, so you probably have already done that, too.
You should also turn off Marpa's strip
option, which is on by default. When the strip
option is on, Marpa "strips" its objects of data that is not needed for subsequent processing. This saves time and memory, but data that is not needed for processing can be extremely valuable for debugging. When the strip
option is left on, many of Marpa's diagnostics methods will return partial information or no information at all.
When a problem is not obvious, the first thing I do is turn on the trace_lex
option. This tells you which tokens the lexer is looking for and which ones it thinks it found. If the problem is in lexing, trace_lex
tells you the whole story. Even if the problem is in the grammar, which tokens the lexer is looking for is a clue to what the recognizer is doing. That is because Marpa uses predictive lexing and only looks for tokens that could result in a successful parse.
It sometimes helps to look carefully at the output of show_rules
and show_symbols
, to check if anything there is clearly not right or not what you expected.
Advanced Techniques
Next, depending on where in the process you're having problems, you might want to turn on some of the more helpful traces. trace_actions
will show you the actions as they are being finalized. trace_iterations
traces the initialization of, and the iteration of, the parse tree. trace_values
traces the values of the nodes as they are pushed on, and popped off, the evaluation stack. After an evaluation, show_tree
will show the entire tree.
For a complete investigation of a parse, do the following:
Turn off the
strip
Marpa option. By default, it is on.Make sure the
warnings
option is turned on. It is on by default.Run
show_symbols
on the precomputed grammar.Run
show_rules
on the precomputed grammar.Run
show_QDFA
on the precomputed grammar.Turn on
trace_lex
before input.Run
show_earley_sets
on the recognizer.Run
show_bocage
in verbose mode on the evaluator after it is created.Run
show_tree
in verbose mode on the evaluator after each call of thevalue
method.Turn on
trace_values
so you can see the values as they pushed onto the evaluation stack, popped off it, calculated and pushed back on.
Note that when the input text to the grammar is of any length, the outputs from show_earley_sets
, show_bocage
, show_tree
, trace_lex
, and trace_values
can be lengthy. You'll want to work with short inputs if at all possible. The internals document has example outputs from the show_QDFA
, show_earley_sets
, show_bocage
, and show_tree
methods, and explains how to read them.
OPTIONS
These are Marpa options. Unless otherwise stated, the Marpa options are valid for all methods which accept Marpa options as named arguments ( Parse::Marpa::mdl
, Parse::Marpa::Grammar::new
, Parse::Marpa::Grammar::set
, Parse::Marpa::Recognizer::new
, Parse::Marpa::Evaluator::new
, and Parse::Marpa::Evaluator::set
). All options are useful at any point in the parse, unless otherwise stated. Trace output goes to the trace file handle.
- academic
-
The academic option turns off all grammar rewriting. The resulting grammar is useless for recognition and parsing. The purpose of the
academic
argument is allow the testing of Marpa's precomputations against examples from textbooks. This is handy for testing the internals. An exception is thrown if the user attempts to create a recognizer from a grammar marked academic. Theacademic
option cannot be set in the recognizer or after the grammar is precomputed. - strip
-
The value is a Boolean. If true, Marpa "strips" its objects when they contain data that is not needed for further processing. This saves space and time. This is the default behavior.
If
strip
is set to false, all data in Marpa's objects, even data no longer needed for processing, is left in place for the entire life of the object. This leftover data can be very important if you're debugging.A grammar is stripped when it is precomputed. A recognizer is stripped when the end of input is recognized. Turning
strip
off after the end of input has been recognized will have no effect. - trace_actions
-
Traces actions as they are compiled. Little or no knowledge of Marpa internals required. This option is useless once the recognizer has been created. Setting it after that point will result in a warning.
- trace_iterations
-
Traces creation of, and iteration of, the parse tree. Knowledge of Marpa internals very useful. May usefully be set at any point in the parse.
- trace_lex
-
A shorthand for setting both
trace_lex_matches
andtrace_lex_tries
. Very useful, and can be interpreted with limited knowledge of Marpa internals. Because Marpa uses predictive lexing, this can give you an idea not only of how lexing is working, but also of what the recognizer is looking for. May be set at any point in the parse, but will be useless if set after input is complete. - trace_lex_matches
-
Traces every successful match in lexing. Can be interpreted with little knowledge of Marpa internals. May be set at any point in the parse, but will be useless if set after input is complete.
- trace_lex_tries
-
Traces every attempted match in lexing. Can be interpreted with little knowledge of Marpa internals. Usually not useful without
trace_lex_matches
.trace_lex
turns on both. May be set at any point in the parse, but will be useless if set after input is complete. - trace_priorities
-
Traces the priority setting of each QDFA state. Requires knowledge of Marpa internals. The priorities are set during precomputation. A trace message warns the user if he sets
trace_priorities
after that point. - trace_rules
-
Traces rules as they are added to the grammar. Useful, but you may prefer the
show_rules()
method. Doesn't require knowledge of Marpa internals.A trace message warns the user if he sets this option when rules have already been added. If a user adds rules using the
source
named argument, and uses thetrace_rules
named argument in the same call, it will take effect after the processing of thesource
option, which is probably not what he intended. To be effectivetrace_rules
must be set in a method call prior to the one with thesource
option. - trace_values
-
Takes as its value an integer zero or greater, which sets the debugging level. A debugging level of zero means no tracing of values. A level of one or more turns on tracing of values as they are pushed onto the evaluation stack, popped off it, and calculated. If the debugging level is 3 or more, the entire evaluation stack is dumped at every step in the evaluation.
Very helpful. Knowledge of Marpa internals is helpful, but not required. May usefully be set at any point.
METHODS
Static Method
show_location
my $recce = Parse::Marpa::Recognizer->new( { grammar => $grammar } );
my $fail_location = $recce->text( \$text_to_parse );
if ( $fail_location >= 0 ) {
print {*STDERR} Parse::Marpa::show_location(
'Parsing failed',
\$text_to_parse,
$fail_location,
)
or Carp::croak "print to STDERR failed: $OS_ERROR";
exit 1;
}
A utility routine helpful for creating messages about problems parsing text. Takes three arguments, all required. The first argument must be a string containing a message. The second argument must be a reference to a string containing the text that was being parsed. The third argument must be an integer, and will be interpreted as a character offset within that string.
show_location
returns a multi-line string. The first, header, line contains the message. The second line is the line from the text being parsed which contains the character offset. The third line contains an ASCII "caret" symbol pointing to the position of the offset in the second line.
Grammar Methods
inaccessible_symbols
$grammar->precompute();
for my $symbol ( @{ $grammar->inaccessible_symbols() } ) {
say 'Inaccessible symbol: ', $symbol;
}
Returns the plumbing names of the inaccessible symbols. Not useful before the grammar is precomputed. Used for test scripts. For debugging and tracing, the warnings
option is usually the most convenient way to obtain the same information.
show_NFA
$grammar->precompute();
print $grammar->show_NFA()
or Carp::croak "print failed: $OS_ERROR";
Returns a multi-line string listing the states of the NFA with the LR(0) items and transitions for each. Not useful before the grammar is precomputed. Not really helpful for debugging grammars and requires very deep knowledge of Marpa internals.
show_QDFA
$grammar->precompute();
print $grammar->show_QDFA()
or Carp::croak "print failed: $OS_ERROR";
Returns a multi-line string listing the states of the QDFA with the LR(0) items, NFA states, and transitions for each. Not useful before the grammar is precomputed. Very useful in debugging, but requires knowledge of Marpa internals.
show_accessible_symbols
$grammar->precompute();
say 'Accessible symbols: ',
$grammar->show_accessible_symbols();
Returns a one-line string with the plumbing names of the accessible symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.
show_nullable_symbols
$grammar->precompute();
say 'Nullable symbols: ',
$grammar->show_nullable_symbols();
Returns a one-line string with the plumbing names of the nullable symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.
show_nulling_symbols
$grammar->precompute();
say 'Nulling symbols: ',
$grammar->show_nulling_symbols();
Returns a one-line string with the plumbing names of the nulling symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.
show_problems
$grammar->precompute();
print $grammar->show_problems()
or Carp::croak "print failed: $OS_ERROR";
Returns a string describing the problems a grammar had in the precomputation phase. For many precomputation problems, Marpa does not immediately throw an exception. This is because there are often several problems with a grammar. Throwing an exception on the first problem would force the user to fix them one at a time -- very tedious. If there were no problems, returns a string saying so.
This method is not useful before precomputation. An exception is thrown if the user attempts to stringify, or to create a parse from, a grammar with problems. The string returned by show_problems
will be part of the exception's error message.
show_productive_symbols
$grammar->precompute();
say 'Productive symbols: ',
$grammar->show_productive_symbols();
Returns a one-line string with the plumbing names of the productive symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.
show_rules
$grammar->precompute();
print $grammar->show_rules()
or Carp::croak "print failed: $OS_ERROR";
Returns a string listing the rules, each commented as to whether it was nullable, nulling, unproductive, inaccessible, empty or not useful. If a rule had a non-zero priority, that is also shown. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.
show_rules
shows a rule as not useful ("!useful
") if it decides not to use it for any reason. Rules marked "!useful
" include not just the ones called useless in standard parsing terminology (inaccessible and unproductive rules) but also any rule which is replaced by one of Marpa's grammar rewrites.
show_symbols
$grammar->precompute();
print $grammar->show_symbols()
or Carp::croak "print failed: $OS_ERROR";
Returns a string listing the symbols, along with whether they were nulling, nullable, unproductive or inaccessible. Also shown is a list of rules with that symbol on the left hand side, and a list of rules which have that symbol anywhere on the right hand side. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.
unproductive_symbols
$grammar->precompute();
for my $symbol ( @{ $grammar->unproductive_symbols() } ) {
say 'Unproductive symbol: ', $symbol;
}
Given a precomputed grammar, returns the plumbing names of the unproductive symbols. Not useful before the grammar is precomputed. Used in test scripts. For debugging and tracing, the warnings
option is usually a more convenient way to obtain the same information.
Recognizer Method
show_earley_sets
my $recce = Parse::Marpa::Recognizer->new( { grammar => $grammar } );
my $fail_location = $recce->text( \$text_to_parse );
print $recce->show_earley_sets
or Carp::croak "print failed: $OS_ERROR";
Returns a multi-line string listing every Earley item in every Earley set.
Evaluator Method
show_bocage
print $evaler->show_bocage($show_bocage_verbosity)
or Carp::croak "print failed: $OS_ERROR";
Returns a multi-line string describing the bocage for an evaluator. The first line gives the name of the Perl package in which Marpa runs the actions for that evaluator, and a count of the parses derived so far from the bocage. The bocage follows.
The bocage is given in pre-order, in the form of a grammar. Parse bocage grammars are similar to parse forest grammars. In the internals document, parse bocage grammars are described at length, using an example output from show_bocage
.
The optional verbosity argument must be an integer greater than or equal to zero. For each parse bocage and-production, if verbosity is set greater than zero, the LR(0) item corresponding to the and-production's and-node is shown, along with the and-node's argument count (or rule length), and an indication of whether or not there is a Perl closure for the and-node.
In addition to and-productions, parse bocages grammars contain or-productions. The information in or-productions is redundant -- all of it is evident from the and-productions. Or-productions are shown only if verbosity is set at 2 or more.
show_tree
print $evaler->show_tree($show_tree_flag)
or Carp::croak "print failed: $OS_ERROR";
When called after a successful call to the value
method, show_tree
returns a multi-line string describing the parse tree produced in the value
call. The tree is listed in pre-order. For each parse tree node, its depth and and-node are given. Also, for the current choice of and-node at that parse tree node, show_tree
gives the cause and predecessor or-nodes, if any; the rule; and the argument count (that is, the rule length).
The optional flag argument, if a true value, causes verbose output. In verbose output, the value of the node is given, when the tree node has a value. Also in verbose output, if the node tree has a Perl closure, that is indicated.
If the value
method was never called, or if the last call to value
returned failure, the result returned by show_tree
is unpredictable.
SUPPORT
See the support section in the main module.
AUTHOR
Jeffrey Kegler
LICENSE AND COPYRIGHT
Copyright 2007 - 2008 Jeffrey Kegler
This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0.