NAME
MarpaX::Repa::Lexer - simplify lexing for Marpa parser
DESCRIPTION
Most details are in MarpaX::Repa.
METHODS
new
Returns a new lexer instance. Takes named arguments.
my $lexer = MyLexer->new(
tokens => {
word => qr{\b\w+\b},
},
store => 'array',
debug => 1,
);
Possible arguments:
- tokens
-
Hash with names of terminals as keys and one of the following as value:
- string
-
Just a string to match.
'a token' => "matches this long string",
- regular expression
-
A
qr{}
compiled regexp.'a token' => qr{"[^"]+"},
Note that regexp MUST match at least one character. At this moment look behind to look at chars before the current position is not supported.
- hash
-
With hash you can define token specific options. At this moment 'store' option only (see below). Use
match
key to set what to match (string or regular expression):'a token' => { match => "a string", store => 'hash', },
Per token options are:
- store
-
What to store (pass as value to Marpa's recognizer). The following variants are supported:
- hash (default)
-
{ token => 'a token', value => 'a value' }
- array
-
[ 'a token', 'a value' ]
- scalar
-
'a value'
- undef
-
undef is stored so later Repa's actions will skip it.
- a callback
-
A function will be called with token name and reference to its value. Should return a reference or undef that will be passed to recognizer.
- check
-
A callback that can check whether token is really match or not.
- complete
-
If true then parse should complete in one go and consume whole input.
- debug
-
If true then lexer prints debug log to STDERR.
- min_buffer
-
Minimal size of the buffer (4*1024 by default).
init
Setups instance and returns $self
. No need to call, it's called from the constructor.
recognize
Takes a recognizer and a file handle. Parses input. Dies on critical errors, but not when parser lost its way. Returns recognizer that was passed.
buffer
Returns reference to the current buffer.
grow_buffer
Called when "buffer" needs a re-fill with a file handle as argument. Returns true if there is still data to come from the handle.
dump_buffer
Returns first 20 chars of the buffer with everything besides ASCII encoded with \x{####}
. Use argument to control size, zero to mean whole buffer.