NAME

Parse::Marpa::DIAGNOSTICS - Marpa's Diagnostics

THIS DOCUMENT IS UNDER CONSTRUCTION

THIS DOCUMENT IS UNDER CONSTRUCTION

OVERVIEW

This document covers Marpa methods and method options primarily useful for debugging grammars and parses. With each trace and each diagnostic method there is often an indication of the degree of knowledge of Marpa's internal workings needed to use their output. There's a guide to Marpa internals in the internals document.

HINTS

In debugging a grammar, the place where the parse was exhausted and an inspection of the input and the grammar often offers enough to spot the problem.

If you don't have Marpa's warnings option turned on (it's on by default), you probably should give it a glance.

If it doesn't the next thing I turn on is usually trace_lex. This tells you which tokens the lexer is looking for and which ones it thinks it found. If the problem is in lexing, this tells you the whole story. Even if it's in the grammar, because Marpa uses predictive lexing, that is, it uses the grammar to predict what terminals should be next, what the lexer is looking for is a good clue to what the parse is doing.

If you run the show() method on the parse object after the parse is done (that is, after initial() or next()), it shows the parse derivation in a more or less conventional format. Next to each rule is the SDFA state involved, and this allows you to to first dip into the internals.

Before going into the internals, you might want to consider another deskcheck of your grammar. But if you really feel you must try to pull the problem out of the Earley sets, run show_SDFA() on the grammar and show_status() on the parse object. You may also find a listing of the rules (show_rules()) and the symbols (show_symbols()) handy for reference. The internals document has example outputs from these methods and outlines how they are read.

I hope in the future to offer routines that pull grammar debugging information from the Earley sets and present it in a way that suggests the status of the parse without exposing the user to Marpa's internals. find_complete_rule() is a start on the infrastructure for that.

DIAGNOSTIC OPTIONS

These are options to the Parse::Marpa::new(), Parse::Marpa::set(), and Parse::Marpa::Parse::new() methods. All trace output goes to the trace file handle.

academic

The academic option is only useful for testing the early stages of Marpa, up to the creation of SDFA's in the precomputation phase. In order to accurately duplicate example in textbooks, CHAF rewriting must be turned off in academic mode. This means no actual parsing can be done with the resulting grammar.

trace_actions

Traces the actions as they are finalized. Little or no knowledge of Marpa internals required.

trace_completions

Traces each earley set as it is completed. I find it better to wait until parse evaluation time, when there's more information and then use Parse::Marpa::Parse::show_status(). Requires knowledge of Marpa internals.

trace_evaluation_choices

When Marpa has a choice among more than one rule, link or token, this option traces the choices Marpa makes. Choices only occur if the grammar is ambiguous. Knowledge of Marpa internals probably needed.

trace_iteration_changes

Traces setting of, and changes in node values. Knowledge of Marpa internals very useful.

trace_iteration_searches

Traces Marpa's exploration of the Earley sets as it is evaluating nodes. Requires knowledge of Marpa internals. Probably not useful except in combination with trace_iteration_changes. trace_iterations turns on both.

trace_iterations

A short hand for setting both trace_iteration_changes and trace_iteration_searches.

trace_lex

A short hand for setting both trace_lex_matches and trace_lex_tries. Very useful, and can be interpreted with limited knowledge of Marpa internals. Because Marpa uses predictive lexing, this can give you an idea of not how lexing is working, but also of what of what the parse engine is looking for. Often the first thing I turn on when I'm debugging a grammar.

trace_lex_matches

Traces every successful match in lexing. Can be interpreted with little knowledge of Marpa internals.

trace_lex_tries

Traces every attempted match in lexing. Can be interpreted with little knowledge of Marpa internals. Usually not useful without trace_lex_matches. trace_lex turns on both.

trace_priorities

Traces the priority setting of each SDFA state. Requires knowledge of Marpa internals.

trace_rules

Traces rules as they are added to the grammar. Useful, but you may prefer the show_rules() method. Doesn't require knowledge of Marpa internals.

Remember, if you are adding rules via the source method option, that other method options take effect after the processing of the source option. As a practical matter, that means that if you must don't set trace_rules in a method call prior to the one with the source option, you will miss the addition of all but a few internally added rules.

trace_values

As each node value is set, prints a trace of the rule and the value. Very helpful and does not require knowledge of Marpa internals.

DIAGNOSTIC METHODS FOR GRAMMAR OBJECTS

Parse::Marpa::inaccessible_symbols(grammar)

Given a precomputed grammar, returns the raw interface names of the inaccessible symbols. The same information is more easily obtained by turning on the warnings option.

Parse::Marpa::show_NFA(grammar)

Given a grammar object, returns a multi-line string listing the states of the NFA with the LR(0) items and transitions for each. Not really helpful for debugging grammars and requires very deep knowledge of Marpa internals.

Parse::Marpa::show_SDFA(grammar)

Given a gramar object, returns a multi-line string listing the states of the SDFA with the LR(0) items, NFA states, and transitions for each. Very useful, but requires knowledge of Marpa internals.

Parse::Marpa::show_accessible_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the accessible symbols of the grammar, space-separated. Handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_location(message, text, offset)

message must be a string, text a reference to a string, and offset, a character offset within that string. show_location() returns a multi-line string with a header line containing message, the line from text containing offset, and a "pointer" line. The pointer line uses the ASCII "caret" symbol to point to the exact offset.

Parse::Marpa::show_nullable_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the nullable symbols of the grammar, space-separated. The format is handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_nulling_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the nulling symbols of the grammar, space-separated. The format is handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_problems(grammar)

Returns a string describing the problems a grammar had in the precomputation phase. Marpa does not immediately throw an exception for many of these because the user usually will want to fix several at a time. If there were no problems, returns a string saying so.

This returned string is the same that Marpa uses in throwing an exception if the user attempts to compile a grammar with problems or to create a parse from it.

Parse::Marpa::show_productive_symbols(grammar)

Given a grammar object, returns a one-line string with the raw interface names of the productive symbols of the grammar, space-separated. The format is handy for quick comparison tests, but otherwise not very useful.

Parse::Marpa::show_rules(grammar)

Returns a string listing the rules, each commented as to whether it was nullable, nulling, unproductive, inaccessible, empty or not useful. If a rule had a non-zero priority, that is also indicated. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.

Parse::Marpa::show_symbols(grammar)

Returns a string listing the symbols, along with whether they were nulling, nullable, unproductive or inaccessible. Also shown is a lists of rules with that symbol on the left hand side, and another list of rules which have that symbol anywhere on the right hand side. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.

Parse::Marpa::unproductive_symbols(grammar)

Given a precomputed grammar, returns the raw interface names of the unproductive symbols. The same information is more easily obtained by turning on the warnings option.

DIAGNOSTIC METHODS FOR PARSE OBJECTS

Parse::Marpa::Parse::show(parse)

Takes a parse object as its argument and "shows it," that is, shows the last derivation of the last parse made from a parse object. Very useful. Basic use requires no Marpa internals. It also reports the Earley item with SDFA state at each line of the derivation, and so can be a good starting point for investigations using the Marpa internals.

Parse::Marpa::Parse::show_status(parse)

This is the central tool when debugging a parse using the Marpa internals. Takes a parse object as its argument and lists every Earley item for all the Earley sets. For each Earley item, any current successor, predecessor, effect, cause, pointer or value is listed. For each item there is also a listing of all its links and rules, which indicates which link or rule is the current choice.

For extremely detailed investigation of a parse, this, the output of trace_lex and a listing of the SDFA states (see show_SDFA()) should be just about everything you need.

Parse::Marpa::Parse::find_complete_rule(parse, start_earleme, _symbol, end_earleme)

The Parse::Marpa::Parse::find_complete_rule() method takes a parse object as its one required argument. Arguments which specify a start_earleme, symbol and end_earleme are optional. If the start earleme is not specified, it defaults to earleme 0. If the end earleme is not specified, its default wll be the default parse end earleme, that is, the default location that Parse::Marpa::Parse::initial() would use for the end of parsing. The symbol argument, if specified, must be the name of the symbol to the raw interface.

The end earleme argument must be at or before the default parse end earleme. If you specify an end earleme after the default parse end earleme, it is ignored and the default parse end earleme is used as the end earleme.

find_complete_rule() looks for parses of complete rules, that is, rules whose right hand side has been completely matched. Only parses which start at the start earleme are considered.

find_complete_rule() looks first for any parses which end at the end earleme. If it finds not, it iterates back, looking for shorter and shorter parses. It does not stop until it reaches the start earleme and is looking for null parses.

While the parses are always for complete rules, they can be subparses in the sense that they are not parses from the grammar's start symbol. Complete parses starting from any symbol are considered, unless a start symbol was specified as an argument. In that case only parses starting from that symbol are considered.

On failure, a zero length array is returned. On success, the return value is an array of two elements. The first element of the array is the earleme at which the complete parse ends. The second element is a pointer to an array of symbol names, all of which are start symbols for parses in the span from start earleme to end earleme. Symbol names will be raw interface names.

Multiple start symbols may be returned, because several different rules may have been completed in the span from start earleme to end earleme, and some of these rules may have different left hand sides. If a start symbol argument was specified, it will be one in the list of symbols in the return value.

find_complete_rule() is an start on improved diagnostics and interfaces for online mode and advanced wizardry with grammars. It is very much subject to change. In the case where no start symbol is specified, find_complete_rule() is probably useless. It returns only information from the first Earley item which matches other criteria. Other Earley items may contain complete rules for the same span, but their left hand sides may not be included in the return value's list of start symbols.