NAME

MarpaX::Repa::Lexer - simplify lexing for Marpa parser

DESCRIPTION

Most details are in MarpaX::Repa.

METHODS

new

Returns a new lexer instance. Takes named arguments.

my $lexer = MyLexer->new(
    tokens => {
        word => qr{\b\w+\b},
    },
    store => 'array',
    debug => 1,
);

Possible arguments:

tokens

Hash with names of terminals as keys and one of the following as value:

string

Just a string to match.

'a token' => "matches this long string",
regular expression

A qr{} compiled regexp.

'a token' => qr{"[^"]+"},

Note that regexp MUST match at least one character. At this moment look behind to look at chars before the current position is not supported.

hash

With hash you can define token specific options. At this moment 'store' option only (see below). Use match key to set what to match (string or regular expression):

'a token' => {
    match => "a string",
    store => 'hash',
},

Per token options are:

store

What to store (pass as value to Marpa's recognizer). The following variants are supported:

hash (default)
{ token => 'a token', value => 'a value' }
array
[ 'a token', 'a value' ]
scalar
'a value'
undef

undef is stored so later Repa's actions will skip it.

a callback

A function will be called with token name and reference to its value. Should return a reference or undef that will be passed to recognizer.

check

A callback that can check whether token is really match or not.

complete

If true then parse should complete in one go and consume whole input.

debug

If true then lexer prints debug log to STDERR.

min_buffer

Minimal size of the buffer (4*1024 by default).

init

Setups instance and returns $self. No need to call, it's called from the constructor.

recognize

Takes a recognizer and a file handle. Parses input. Dies on critical errors, but not when parser lost its way. Returns recognizer that was passed.

buffer

Returns reference to the current buffer.

grow_buffer

Called when "buffer" needs a re-fill with a file handle as argument. Returns true if there is still data to come from the handle.

dump_buffer

Returns first 20 chars of the buffer with everything besides ASCII encoded with \x{####}. Use argument to control size, zero to mean whole buffer.