NAME

Parse::Marpa::DIAGNOSTICS - Marpa's Diagnostics

OVERVIEW

This document covers Marpa methods and method options primarily useful for debugging grammars and parses. With each trace and each diagnostic method there is often an indication of the degree of knowledge of Marpa's internal workings needed to use the output. There's a guide to Marpa internals in the internals document.

HINTS

In debugging a grammar, the place where the parse was exhausted and an inspection of the input and the grammar often offers enough to spot the problem.

If you don't already have Marpa's warnings option turned on (they're on by default), you probably should.

If that doesn't help, the next thing I turn on is usually trace_lex. This tells you which tokens the lexer is looking for and which ones it thinks it found. If the problem is in lexing, trace_lex tells you the whole story. Even if the problem is in the grammar, because Marpa uses predictive lexing (in other words, because Marpa uses the grammar to predict what terminals to look for next), which tokens the lexer is looking for is a clue to what the recognizer is doing.

If you run the show() method on the parse object after the parse is done (that is, after initial() or next()), it shows the parse derivation in a more or less conventional format. Next to each rule is the SDFA state involved, and this is a first glimpse into the internals.

Before going deep into the internals, you should look at the output of show_rules() and show_symbols() to see if anything is clearly not right or not what you expected. Depending on where in the process you're having problems, you might want to turn on some of the more helpful traces. trace_actions will show you the actions as they are being finalized. In an ambiguous parse, trace_evaluation_choices shows the choices Marpa is making. trace_iteration_changes and trace_rules traces the initialization of, and changes in, node values.

For the real "into the Earley sets with gun and camera" stuff, run show_SDFA() on the grammar and show_status() on the parse object. The internals document has example outputs from these methods and outlines how to read them.

DIAGNOSTIC OPTIONS

These are options to the Parse::Marpa::new(), Parse::Marpa::set(), and Parse::Marpa::Parse::new() methods. All trace output goes to the trace file handle.

academic

The academic option turns off all grammar rewriting. This makes the resulting grammar useless for an actual parse. The purpose is to see if Marpa can accurately duplicate examples from textbooks. This is handy for testing Marpa's deepest internals.

trace_actions

Traces the actions as they are finalized. Little or no knowledge of Marpa internals required.

trace_completions

Traces each Earley set as it is completed. I find it better to wait until parse evaluation time, when there's more information and then use Parse::Marpa::Parse::show_status(). Requires knowledge of Marpa internals.

trace_evaluation_choices

This option traces the choices Marpa make. when Marpa has a choice among more than one rule, link or token. Choices only occur if the grammar is ambiguous. Knowledge of Marpa internals probably needed.

trace_iteration_changes

Traces setting of, and changes in node values. Knowledge of Marpa internals very useful.

trace_iteration_searches

Traces Marpa's exploration of the Earley sets as it is evaluating nodes. Requires knowledge of Marpa internals. Probably not useful except in combination with trace_iteration_changes. trace_iterations turns on both.

trace_iterations

A short hand for setting both trace_iteration_changes and trace_iteration_searches.

trace_lex

A short hand for setting both trace_lex_matches and trace_lex_tries. Very useful, and can be interpreted with limited knowledge of Marpa internals. Because Marpa uses predictive lexing, this can give you an idea of how lexing is working, but also of what the parse engine is looking for. Often the first thing I turn on when I'm debugging a grammar.

trace_lex_matches

Traces every successful match in lexing. Can be interpreted with little knowledge of Marpa internals.

trace_lex_tries

Traces every attempted match in lexing. Can be interpreted with little knowledge of Marpa internals. Usually not useful without trace_lex_matches. trace_lex turns on both.

trace_priorities

Traces the priority setting of each SDFA state. Requires knowledge of Marpa internals.

trace_rules

Traces rules as they are added to the grammar. Useful, but you may prefer the show_rules() method. Doesn't require knowledge of Marpa internals.

Remember, if you are adding rules via the source method option, the other method options take effect after the processing of the source option. As a practical matter, that means that if you must don't set trace_rules in a method call prior to the one with the source option, you will miss the addition of all but a few internally added rules.

trace_values

As each node value is set, prints a trace of the rule and the value. Very helpful and does not require knowledge of Marpa internals.

DIAGNOSTIC METHODS FOR GRAMMAR OBJECTS

Parse::Marpa::inaccessible_symbols(grammar)

Given a precomputed grammar, returns the raw interface names of the inaccessible symbols. The same information is more easily obtained by turning on the warnings option.

Parse::Marpa::show_NFA(grammar)

Given a grammar object, returns a multi-line string listing the states of the NFA with the LR(0) items and transitions for each. Not really helpful for debugging grammars and requires very deep knowledge of Marpa internals.

Parse::Marpa::show_SDFA(grammar)

Given a gramar object, returns a multi-line string listing the states of the SDFA with the LR(0) items, NFA states, and transitions for each. Very useful, but requires knowledge of Marpa internals.

Parse::Marpa::show_accessible_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the accessible symbols of the grammar, space-separated. Handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_location(message, text, offset)

message must be a string, text a reference to a string, and offset, a character offset within that string. show_location() returns a multi-line string with a header line containing message, the line from text containing offset, and a "pointer" line. The pointer line uses the ASCII "caret" symbol to point to the exact offset.

Parse::Marpa::show_nullable_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the nullable symbols of the grammar, space-separated. The format is handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_nulling_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the nulling symbols of the grammar, space-separated. The format is handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_problems(grammar)

Returns a string describing the problems a grammar had in the precomputation phase. Marpa does not immediately throw an exception for many of the precomputation problems because the user usually will want to fix several at a time. If there were no problems, returns a string saying so.

This returned string is the same that Marpa includes in exceptions thrown when the user attempts to compile, or to create a parse from, a grammar with problems.

Parse::Marpa::show_productive_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the productive symbols of the grammar, space-separated. The format is handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_rules(grammar)

Returns a string listing the rules, each commented as to whether it was nullable, nulling, unproductive, inaccessible, empty or not useful. If a rule had a non-zero priority, that is also shown. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.

Parse::Marpa::show_symbols(grammar)

Returns a string listing the symbols, along with whether they were nulling, nullable, unproductive or inaccessible. Also shown is a list of rules with that symbol on the left hand side, and a list of rules which have that symbol anywhere on the right hand side. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.

Parse::Marpa::unproductive_symbols(grammar)

Given a precomputed grammar, returns the raw interface names of the unproductive symbols. The same information is more easily obtained by turning on the warnings option.

DIAGNOSTIC METHODS FOR PARSE OBJECTS

Parse::Marpa::Parse::show(parse)

Takes a parse object as its argument and "shows the parse," that is, shows the derivation of the last parse made from a parse object. Very useful. Basic use requires no Marpa internals. It also reports the Earley item and SDFA state at each line of the derivation.

Parse::Marpa::Parse::show_status(parse)

This is the central tool for debugging a parse using Marpa internals. Takes a parse object as its argument and returns a multi-line string listing every Earley item in every Earley sets. For each Earley item, any current successor, predecessor, effect, cause, pointer or value is shown. Also shown are lists of all the links and rules in each Earley item, indicating which link or rule is the current choice.

For detailed investigation of a parse, this, the output of trace_lex and listings of the symbols, the rules, and the SDFA states (see show_SDFA()), will usually be everything you need.

Parse::Marpa::Parse::find_complete_rule(parse, start_earleme, _symbol, end_earleme)

The Parse::Marpa::Parse::find_complete_rule() method takes a parse object as its one required argument. Arguments which specify a start_earleme, symbol and end_earleme are optional. If the start earleme is not specified, it defaults to earleme 0. If the end earleme is not specified, its default wll be the default parse end earleme, that is, the default location that Parse::Marpa::Parse::initial() would use for the end of parsing. The symbol argument, if specified, must be the raw interface name of a symbol.

The end earleme argument must be at or before the default parse end earleme. If you specify an end earleme after the default parse end earleme, it is ignored and the default parse end earleme is used as the end earleme.

find_complete_rule() looks for parses of complete rules, that is, rules whose right hand side has been completely matched. Only parses which start at the start earleme are considered.

find_complete_rule() looks first for any parses which end at the end earleme. If it finds none, it looks for shorter and shorter parses until it reaches the start earleme and is looking at null parses.

While the parses find_complete_rule() find are always for complete rules, they can be subparses in the sense that they are not parses from the grammar's start symbol. Complete parses starting from any symbol are considered, unless a start symbol was specified as an argument. In that case only parses starting from that symbol are considered.

On failure to find a rule matching the criteria, a zero length array is returned. On success, the return value is an array of two elements. The first element of the array is the earleme at which the complete parse ends. The second element is a pointer to an array of symbol names which are start symbols of parses in the span from start earleme to end earleme. Symbol names will be raw interface names.

Multiple start symbols may be returned, because several different rules may have been completed in the span from start earleme to end earleme, and some of these rules may have different left hand sides. If a start symbol argument was specified, it will be one of the list of symbols in the return value.

In the case where no start symbol is specified, find_complete_rule() is probably useless. It returns only information from the first Earley item which matches other criteria. Other Earley items may contain complete rules for the same span, but their left hand sides may not be included in the return value's list of start symbols.

find_complete_rule() was an experiment in methods for improved diagnostics, online mode, and advanced wizardry with grammars. It is probably going to be replaced. The replacement method or methods should, given an end earleme or a range of end earlemes, be able to return all completed and expected symbols. Information about their start and end earleme should be available with the completed symbols. For the expected symbols, the earleme at which they were expected should given.