NAME
Parser::Combinators - A library of building blocks for parsing, similar to Haskell's Parsec
SYNOPSIS
use Parser::Combinators;
my $parser = < a combination of the parser building blocks from Parser::Combinators >
(my $status, my $rest, my $matches) = $parser->($str);
my $parse_tree = getParseTree($matches);
DESCRIPTION
Parser::Combinators is a simple parser combinator library inspired by the Parsec parser combinator library in Haskell. It is not complete (i.e. not all Parsec combinators have been implemented), I have just implemented what I needed:
whiteSpace : parses any white space, always returns success. I
Lexeme parsers (they remove trailing whitespace):
word : (\w+)
number : (\d+)
symbol : parses a given symbol, e.g. symbol('int')
comma : parses a comma
char : parses a given character
Combinators:
sequence( [ $parser1, $parser2, ... ], $optional_sub_ref )
choice( $parser1, $parser2, ...) : tries the specified parsers in order
try : normally, the parser consums matching input. try() stops a parser from consuming the string
maybe : is like try() but always reports success
parens( $parser ) : parser '(', then applies $parser, then ')'
many( $parser) : applies $parser zero or more times
sepBy( $separator, $parser) : parses a list of $parser separated by $separator
oneOf( [$patt1, $patt2,...]): like symbol() but parses the patterns in order
Dangerous: the following parsers take a regular expression
upto( $patt )
greedyUpto( $patt)
regex( $patt)
As there is no Haskell-style syntactic sugar in Perl, I use the sequence() combinator where in Haskell you would use the do-notation. sequence() takes a ref to a list of parsers and optionally a code ref to a sub that can manipulate the result before returning it.
Also, you can label any parser in a sequence using an anonymous hash, for example:
sub type_parser {
sequence [
{Type => word},
maybe parens choice(
{Kind => number},
sequence [
symbol('kind'),
symbol('='),
{Kind => number}
]
)
]
}
Applying this parser returns a tuple as follows:
my $str = 'integer(kind=8), '
(my $status, my $rest, my $matches) = type_parser($str);
Here,`$status` is 0 if the match failed, 1 if it succeeded. `$rest` contains the rest of the string. The actual matches are stored in the array $matches. As every parser returns its resuls as an array ref, $matches contains the concrete parsed syntax, i.e. a nested array of arrays of strings.
Dumper($matches) ==> [{'Type' => ['integer']},[['kind'],['\\='],{'Kind' => ['8']}]]
You can extract only the labeled matches using `getParseTree`:
my $parse_tree = getParseTree($matches);
Dumper($parse_tree) ==> [{'Type' => 'integer'},{'Kind' => '8'}]
PS: I have also implemented bind() and enter() (as 'return' is reserved) for those who like monads ^_^
AUTHOR
Wim Vanderbauwhede <Wim.Vanderbauwhede@gmail.com>
COPYRIGHT
Copyright 2013- Wim Vanderbauwhede
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.