NAME

MarpaX::Repa::Lexer - simplify lexing for Marpa parser

DESCRIPTION

Most details are in MarpaX::Repa.

METHODS

new

Returns a new lexer instance. Takes named arguments.

my $lexer = MyLexer->new(
    tokens => {
        word => qr{\b\w+\b},
    },
    store => 'array',
    recognizer => $recognizer,
    debug => 1,
);

Possible arguments:

tokens

Hash with names of terminals as keys and one of the following as values:

string

Just a string to match.

'a token' => "matches this long string",
regular expression

A qr{} compiled regexp.

'a token' => qr{"[^"]+"},

Note that regexp MUST match at least one character. At this moment look behind to look at chars before the current position is not supported.

hash

With hash you can define token specific options. At this moment 'store' option only (see below). Use match key to set what to match (string or regular expression).

'a token' => {
    match => "a string",
    store => 'hash',
},
store

What to store (pass to Marpa's recognizer). The following variants are supported:

hash (default)
{ token => 'a token', value => 'a value' }
array
[ 'a token', 'a value' ]
scalar
'a value'
undef

undef is stored so later Repa's actions will skip it.

a callback

A function will be called with token name and reference to its value. Should return a reference or undef that will be passed to recognizer.

recognizer

Marpa::R2::Recognizer object or its subclass.

debug

If true then lexer prints debug log to STDERR.

min_buffer

Minimal size of the buffer (4*1024 by default).

init

Setups instance and returns $self. Called from constructor.

recognize

Takes a file handle and parses it. Dies on critical errors, not when parser lost its way. Returns recognizer that was passed to "new".

buffer

Returns reference to the current buffer.

grow_buffer

Called when "buffer" needs a re-fill with a file handle as argument. Returns true if there is still data to come from the handle.

dump_buffer

Returns first 20 chars of the buffer with everything besides ASCII encoded with \x{####}. Use argument to control size, zero to mean whole buffer.