NAME
MarpaX::ESLIF::Recognizer - MarpaX::ESLIF's recognizer
VERSION
version 3.0.19
SYNOPSIS
my $eslifRecognizer = MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface);
The recognizer interface is used to read chunks of data, that the internal recognizer will keep in its internal buffers, until it is consumed. The recognizer internal buffer may not be an exact duplicate of the external data that was read: in case of a character stream, the external data is systematically converted to UTF-8 sequence of bytes. If the user is pushing alternatives, he will have to know how many bytes this represent: native number of bytes
DESCRIPTION
MarpaX::ESLIF::Recognizer is a possible step after a MarpaX::ESLIF::Grammar instance is created.
METHODS
MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface)
my $eslifRecognizer = MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface);
Returns a recognizer instance, noted $eslifRecognizer
later. Parameters are:
$eslifGrammar
-
MarpaX::ESLIF:Grammar object instance. Required.
$recognizerInterface
-
An object implementing MarpaX::ESLIF::Recognizer::Interface methods. Required.
$eslifRecognizer->newFrom($eslifGrammar)
my $eslifRecognizerNewFom = $eslifRecognizer->newFrom($eslifGrammar);
Returns a recognizer instance that is sharing the stream of $eslifRecognizer
, but applied to the other grammar $eslifGrammar
.
$eslifRecognizer->set_exhausted_flag($flag)
$eslifRecognizer->set_exhausted_flag($flag);
Changes the isWithExhaustion() flag associated with the $eslifRecognizer
recognizer instance.
$eslifRecognizer->share($eslifRecognizerShared)
$eslifRecognizer->share($eslifRecognizerShared);
Shares the stream of $eslifRecognizerShared
recognizer instance with the $eslifRecognizer
instance.
$eslifRecognizer->isCanContinue()
Returns a true value if recognizing can continue.
$eslifRecognizer->isExhausted()
Returns a true value if parse is exhausted, always set even if there is no exhaustion event.
$eslifRecognizer->scan($initialEvents)
Start a recognizer scanning. This call is allowed once in recognizer lifetime. If specified, $initialEvents
must be a scalar. Default value is 0.
This method can generate events. Initial events are those that are happening at the very first step, and can be only prediction events. This may be annoying, and most applications do not want that - but some can use this to get the control before the first data read.
Returns a boolean indicating if the call was successful or not.
$eslifRecognizer->resume($deltaLength)
This method tell the recognizer to continue. Events can be generate after resume completion.
$deltaLength
is optional and is a number of bytes to skip forward before resume goes on, must be positive or greater than 0. In case of a character stream, user will have to compute the number of bytes as if the input was in the UTF-8 encoding. Default value is 0
.
Returns a boolean indicating if the call was successful or not.
$eslifRecognizer->events()
When control is given back to the end-user, he can always ask what are the current events.
Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:
- type
-
The type of event, that is one one the value listed in MarpaX::ESLIF::Event::Type.
- symbol
-
The name of the symbol that triggered the event. Can be
undef
in case of exhaustion event. - event
-
The name of the event that triggered the event. Can be
undef
in case of exhaustion event.
$eslifRecognizer->eventOnOff($symbol, $eventTypes, $onOff)
Events can be switched on or off. For performance reasons, if you know that you do not need an event, it can be a good idea to switch if off. Required parameters are:
$symbol
-
The symbol name to which the event is associated.
$symbol
-
The symbol name to which the event is associated.
$eventTypes
-
A reference to an array of event types, as per MarpaX::ESLIF::Event::Type.
$onOff
-
A flag that set the event on or off.
Note that trying to change the state of an event that was not pre-declared in the grammar is a no-op.
Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:
$eslifRecognizer->lexemeAlternative($name, $anything, $grammarLength)
A lexeme is a terminal in the legacy parsing terminology. The lexeme word mean that in the grammar it is associated to a sub-grammar. Pushing an alternative mean that the end-user is intructing the recognizer that, at this precise moment of lexing, there is a given lexeme associated with the $name
parameter, with a given opaque value <$anything>. Grammar length parameter $grammarLength
is optional, and defaults to 1
, i.e. one lexeme (which is a symbol in the grammar) correspond to one token. Nevertheless it is possible to say that an alternative span over more than one symbol.
Returns a boolean indicating if the call was successful or not.
$eslifRecognizer->lexemeComplete($length)
Say the recognizer that alternatives are complete at this precise moment of parsing, and that the recognizer must move forward by $length
bytes, which can be zero (end-user's responsibility). This method can generate events.
Returns a boolean indicating if the call was successful or not.
$eslifRecognizer->lexemeRead($name, $anything, $length, $grammarLength)
A short-hand version of lexemeAlternative() followed by lexemeComplete(), with the same meaning for all parameters. This method can generate events.
Returns a boolean indicating if the call was successful or not.
$eslifRecognizer->lexemeTry($name)
The end-user can ask the recognizer if a lexeme $name
may match.
Returns a boolean indicating if the lexeme is recognized.
$eslifRecognizer->discardTry()
The end-user can ask the recognizer if :discard
rule may match.
Returns a boolean indicating if :discard is recognized.
$eslifRecognizer->lexemeExpected()
Ask the recognizer a list of expected lexemes.
Returns a reference to an array of names, eventually empty.
$eslifRecognizer->lexemeLastPause($name)
Ask the recognizer the end-user data associated to last lexeme pause after event. A pause after event is the when the recognizer was responsible of lexeme recognition, after a call to scan() or resume() methods. This data will be an exact copy of the last bytes that matched for a given lexeme, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.
Returns the associated bytes, or undef
.
$eslifRecognizer->lexemeLastTry($name)
Ask the recognizer the end-user data associated to last successful lexeme try. This data will be an exact copy of the last bytes that matched for a given lexeme, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.
Returns the associated bytes, or undef
.
$eslifRecognizer->discardLastTry()
Ask the recognizer the end-user data associated to last successful discard try. This data will be an exact copy of the last bytes that matched for a given lexeme, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.
Returns the associated bytes, or undef
.
$eslifRecognizer->discardLast()
Ask the recognizer the end-user data associated to last successful discard. This data will be an exact copy of the last bytes that matched for the latest :discard rule, meaning that it may be UTF-8 sequence of bytes in case of character stream.
For performance reasons, last discard data is available only if the recognizer interface returned a true value for isWithTrack()
method and if there is a discard event for the :discard
rule that matched.
Returns the associated bytes, or undef
.
$eslifRecognizer->isEof()
This method is similar to the isEof()'s recognizer interface. Except that this is asking the question directly to the recognizer's internal state, that maintains a copy of this flag.
Returns a boolean indicating of end-of-user-data is reached.
$eslifRecognizer->isExhausted()
This method returns a true value if the underlying grammar is exhausted, a false value otherwise, and croaks on failure.
Returns a boolean indicating of end-of-user-data is reached.
$eslifRecognizer->read()
Forces the recognizer to read more data. Usually, the recognizer interface is called automatically whenever needed.
Returns a boolean value indicating success or not.
$eslifRecognizer->input()
Get a copy of the current internal recognizer buffer, starting at the exact byte where resume() would start. An undefined output does not mean there is an error, but that internal buffers are completely consumed. ESLIF will automatically require more data unless the EOF flag is set. Internal buffer is always UTF-8 encoded to every chunk of data that was declared to be a character stream.
Returns the associated input bytes, or undef
.
$eslifRecognizer->progressLog($start, $end, $loggerLevel)
Asks to get a logging representation of the current parse progress. The format is fixed by the underlying libraries. The $start
and $end
parameters follow the perl convention of indices, i.e. when they are negative, start that far from the end. For example, -1 mean the last indice, -2 mean one before the last indice, etc... $loggerLevel
is a level as per MarpaX::ESLIF::Logger::Level.
Nothing is returned.
$eslifRecognizer->lastCompletedOffset($name)
The recognizer is tentatively keeping an absolute offset every time a lexeme is complete. We say tentatively in the sense that no overflow checking is done, thus this number is not reliable in case the user data spanned over a very large number of bytes. In addition, the unit is in bytes. $name
can be any symbol in the grammar.
Returns the absolute offset in bytes.
$eslifRecognizer->lastCompletedLength($name)
The recognizer is tentatively computing the length of every symbol completion. Since this value depend internally on the absolute previous offset, it is not guaranteed to be exact, in the sense that no overflow check is done. $name
can be any symbol in the grammar.
Returns the absolute length in bytes.
$eslifRecognizer->lastCompletedLocation($name)
Returns an array containing at indices 0 and 1 the values of $eslifRecognizer-
lastCompletedOffset($name)> and $eslifRecognizer-
lastCompletedLength($name)>, respectively.
$eslifRecognizer->line()
If, at creation, the recognizer interface returned a true value for the $recognizerInterface-
isWithNewline()> method, then the recognizer will track the number of lines for ever character-oriented chunk of data.
Returns the line number, or 0.
$eslifRecognizer->column()
If, at creation, the recognizer interface returned a true value for the $recognizerInterface-
isWithNewline()> method, then the recognizer will track the number of columns for ever character-oriented chunk of data.
Returns the column number, or 0.
$eslifRecognizer->location()
Returns an array containing at indices 0 and 1 the values of $eslifRecognizer-
line()> and $eslifRecognizer-
column()>, respectively.
$eslifRecognizer->hookDiscard($discardOnOff)
Hook the recognizer to enable or disable the use of :discard
if it exists. Default mode is on. This is a permanent setting.
$eslifRecognizer->hookDiscardSwitch()
Hook the recognizer to switch the use of :discard
if it exists. This is a permanent setting.
SEE ALSO
MarpaX::ESLIF::Recognizer::Interface, MarpaX::ESLIF::Event::Type, MarpaX::ESLIF::Logger::Level
AUTHOR
Jean-Damien Durand <jeandamiendurand@free.fr>
COPYRIGHT AND LICENSE
This software is copyright (c) 2017 by Jean-Damien Durand.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.