Name
Marpa::R3::Exhaustion - Parse exhaustion in the SLIF
About this document
This page is part of the reference documents for the recognizer objects of Marpa's SLIF (Scanless interface). It contains a detailed discussion of parse exhaustion.
Exhaustion
At bottom, parse exhaustion is a simple concept. The recognizer may reach a point where there is simply no way to continue successfully. Regardless of what it reads next, the parse will fail. When this happens, the parse is said to be exhausted.
Some users have confused parse exhaustion with parse failure. But other users have confused parse exhaustion with parse success. That is because, for a particular grammar, there can be a strong association between parse exhaustion and parse success, but the strong association can go either way. Grammars can be either exhaustion-loving or exhaustion-hating. Both kinds of grammar are very common in practical application.
Hate and love
In an exhaustion-hating application, parse exhaustion is typically parse failure. C programs, Perl scripts and most programming languages are exhaustion-hating applications. If a C program is well-formed, it is always possible to read more input. The same is true of a Perl program that does not have a __DATA__
section.
In an exhaustion-loving applications parse exhaustion means parse success. A toy example of an exhaustion-loving application is the language consisting of balanced parentheses. When the parentheses come into perfect balance the parse is exhausted, because any further input would unbalance the brackets. And the parse succeeds when the parentheses come into perfect balance. Exhaustion means success.
Any language which balances start and end indicators will tend to be exhaustion-loving. HTML and XML, with their start and end tags, can be seen as exhaustion-loving languages.
For many languages, it's not strictly love or hate. I mentioned Perl's __DATA__
as a complication in a basically exhaustion-hating language. It is possible for a language to be exhaustion-loving at some points and exhaustion-hating at others. We can call those languages exhaustion-conflicted.
Event-triggering methods
Any of the the event-triggering methods may encounter parse exhaustion. The event-triggering SLIF recognizer methods are the recognizer's new()
constructor, read()
, resume()
, lexeme_read_block()
, lexeme_read_literal()
, lexeme_read_string()
and the lexeme_complete()
method.
Synchronous and asynchronous parse exhaustion
A parse exhaustion event is called asynchronous if it occurs in a method, and at a location, where the method would have continued reading under "ordinary circumstances". In this context, "ordinar circumstances" means
that parse exhaustion has not occurred, and
that no event handler has paused the recognizer.
A parse exhaustion event is called synchronous if it is not asynchronous.
Parse exhaustion in the lexeme_read_block()
, lexeme_read_literal()
, lexeme_read_string()
and lexeme_complete()
methods is always synchronous, because they always return control to the app after every attempt to read input -- they never try to continue reading input. Parse exhaustion in the recognizer's new()
constructor is always synchronous, because it can only occur if the grammar is nulling. Parse exhausion in the read()
or the resume()
methods may be either synchronous or asynchronous.
Marpa can be set up so that an event occurs on asynchronous exhaustion. In this document, an exhaustion location is a location at which parse exhaustion occurs.
Handling parse exhaustion
How parse exhaustion is handled depends on the setting of the SLIF's exhaustion
grammar setting. The value of this may be "fatal
" or "event
". ("fatal
" is the default.)
Synchronous parse exhaustion is always ignored, regardless of the recognizer setting. No exhaustion event is triggered by synchronous parse exhaustion.
If the exhaustion
setting is "fatal
", asynchronous parse exhaustion is thrown as a fatal error. If the exhaustion
setting is "event
", then an exhaustion event is triggered, returning control to the application. This is treated by the triggering method as a successful return.
Note that the lexeme_read_block()
, lexeme_read_literal()
, lexeme_read_string()
and lexeme_complete()
methods always ignore parse exhaustion, regardless of the exhaustion
grammar setting. This is because these methods read input only at a single location, so that every parse exhaustion is synchronous.
Detecting parse exhaustion
The return value of an event-triggering method does not indicate whether exhaustion occurred or not. In most cases, you will either know from the the context whether the parse is exhausted, or you will not care. But what if you do not know and do care?
Those applications that want to know whether a parse is exhausted or not can directly query parse exhaustion status with the exhausted()
method. Even when parse exhaustion events are enabled, using the exhausted()
method is the preferred method for detecting exhaustion, because it reports both asynchronous and synchronous parse exhaustion. Exhaustion events only trigger in cases of asynchronous parse exhaustion.
Exhaustion-conflicted
Exhaustion-conflicted applications are those which cannot be called exhaustion-loving or exhaustion-hating. This may be because their behavior is a combination of the two. But it may also be because the application's behavior is not known -- for example, while developing an application, it's convenient to assume that it is exhaustion-conflicted.
The SLIF's behavior for exhaustion-conflicted applications has to be aimed at a "lowest common denominator". It is also a good idea for a default to be a lowest common denominator and, by default, the SLIF assumes that an application is exhausted-conflicted. In fact, the default behavior on parse exhaustion usually works well enough that it does not need customizing.
For a typical application without events, end of parse (EOP) is end of string (EOS). In this case exhaustion before EOS is a fatal error, which is usually what is desired. On return due to EOS, unless the application checks, it will not know whether exhaustion occurred, but usually it does not care. If the application does care, it can check for exhaustion explicitly.
If the application uses events to signal EOP, the case is much the same. On return due to an event, the application will not know if exhaustion occurred, but usually it will not care. If the application does care, it can check for exhaustion explicitly.
If the application uses events for other purposes, an event may "hide" exhaustion, so that it is not thrown as an error. Typically, an application will soon attempt to continue the reading of input, and when it does there will be a fatal error. An application which wants to know about exhaustion immediately, either to "fast fail" or for other reasons, can check for exhaustion explicitly every time an event triggers.
Exhaustion-loving
For an exhaustion-loving application, what was said for exhaustion-conflicted applications applies without change. Applications that consider it important to confirm that exhaustion did occur at EOP can check for exhaustion explicitly
Some applications go beyond being exhaustion-loving, and want to use exhaustion to signal the EOP. These exhaustion-sensing applications are discussed below.
Exhaustion-hating
Exhaustion-hating applications are handled reasonably by the default behavior. Asynchronous exhaustion will be a fatal error. Synchronous exhaustion will cause failure at the next read, unless it happens at EOP. By default, exhaustion at EOP will go unreported but if an application really is exhaustion-hating, the parse will fail, and parse failure will certainly show up when the application tries to evaluate the parse.
Exhaustion-hating applications, if they want to be stricter than this, can check for exhaustion explicitly whenever an event-triggering method returns. A possible annoyance is that, depending where it happens, exhaustion may also cause the event-triggering method to throw an exception. Applications which want more orthogonality in their exhaustion handling can enable exhaustion events, which will prevent exceptions being thrown due to parse exhaustion.
Exhaustion-sensing
Sometimes an application, rather than read an entire input, wants to find the longest occurrence starting at some location. (Lexers are typically applications of this kind.) Looking for exhaustion is one way to try to implement this kind of "longest acceptable input stream" search. But exhaustion-sensing is not necessarily the best way, or even a good way, to find the "longest parse". Exhaustion may not happen until after last successful parse -- sometimes not until long after it. Completion parse events may be a cleaner way to deal with this.
Applications which do want to use parse exhaustion as part of a strategy for finding the EOP can set the SLIF's exhaustion
grammar setting to "event
", so that a parse event occurs at parse exhaustion. When the event-triggering method returns, the application can then check for exhaustion explicitly.
COPYRIGHT AND LICENSE
Marpa::R3 is Copyright (C) 2018, Jeffrey Kegler.
This module is free software; you can redistribute it and/or modify it
under the same terms as Perl 5.10.1. For more details, see the full text
of the licenses in the directory LICENSES.
This program is distributed in the hope that it will be
useful, but without any warranty; without even the implied
warranty of merchantability or fitness for a particular purpose.