—# $Id: README.pod 17066 2007-02-19 19:39:53Z particle $
=head1 Parrot Grammar Engine (PGE)
This is a regular expression/rules/grammar engine/parser designed to
run in Parrot. It's still a work in progress, but has a lot of
nice features in it, including support for perl 6 regexes, globs,
shift-reduce parsing, and some support for perl 5 regular expressions.
It also includes the "pgc.pir" grammar compiler, which can convert
an entire grammar specification into the appropriate PIR code
for execution.
A nice feature of PGE is that one can easily combine many
different parsing styles into a single interface. PGE uses
perl 6 rules for its top-down parsing, an operator precedence
parser for bottom-up (shift/reduce) parsing, and allows control
to pass freely between the two styles as well as to custom parsing
subroutines.
=head1 Installation
PGE assumes that it is part of the parrot distribution in the
F<compilers/pge> directory. Simply type C<make> in this directory
to build the various *.pbc files and install them into
runtime/parrot/library.
The distribution comes with a small F<demo.pir> program that gives an
example of using PGE. To run the demo, simply do
C<parrot demo.pir>. The demo understands the following commands:
rule pattern - compile a Perl 6 rule from "pattern"
save name - save the current rule as "name"
text - a text string to match against previously entered rule
pir - display the PIR code generated for current rule
exp - display the expression tree for the current rule
trace - toggle pattern execution tracing
next - repeat last match on target string
=head1 PGE's rule engine (PGE::P6Regex)
Once PGE is compiled and installed, you generally load it using
the load_bytecode operation, as in
load_bytecode 'PGE.pbc'
This imports the PGE::P6Regex compiler, which can be used to compile
strings of Perl 6 rules. A sample compile sequence would be:
.local pmc p6regex_compile
p6regex_compile = compreg 'PGE::P6Regex' # get the compiler
.local string pattern
.local pmc rulesub
pattern = '^(From|Subject)\:' # pattern to compile
rulesub = p6regex_compile(pattern) # compile it to rulesub
Then, to match a target string we simply call the subroutine
to get back a C<PGE::Match> object:
.local pmc match
$S0 = 'From: pmichaud@pobox.com' # target string
match = rulesub($S0) # execute rule
The Match object is true if it successfully matched, and contains
the strings and subpatterns that were matched as part of the capture.
Parrot's "Data::Dumper" can be used to quickly view the results
of the match:
load_bytecode 'dumper.pir'
load_bytecode 'PGE/Dumper.pir'
match_loop:
unless match goto match_fail # if match fails stop
print "match succeeded\n"
_dumper(match)
match.'next'() # find the next match
goto match_loop
match_fail:
print "match failed\n"
One can also get the intermediate PIR code that PGE generates for
the rule subroutine -- just use
$S0 = p6regex_compile(pattern, 'target'=>'PIR')
and you can print/inspect the contents of $S0 to see the generated code.
See the STATUS file for a list of implemented and yet-to-be-implemented
features.
=head1 Known limitations of the rule engine
PGE doesn't (yet) properly handle nested repetitions of zero-length
patterns in groups -- that's coming soon.
Many well-known optimizations (e.g., Boyer-Moore) aren't
implemented yet, although a variety of optimizations are being
added as we generate code.
Lastly, error handling needs to be improved, but this will likely
be decided as we discover how PGE integrates with the rest of
Parrot.
=head1 Implementation notes
Basically, PGE is a compiler just like any other, except that its
"language" is the Perl 6 regex syntax and its output is a subroutine
that can match strings. So, PGE consists of a series of parsers
(for each pattern matching language), an intermediate expression
format, and a code generator.
The parsers can be written using PIR subroutines or PGE's built-in
operator precedence (shift/reduce) parser; the parser for Perl 6
regexes is built with the operator precedence parser. This parser
produces a parse tree (in the form of a Match object) for a given
Perl 6 regex. The parse tree then goes through semantic analysis
and reduction phases before being sent to code generation to
produce a PIR subroutine.
The generated PIR code uses bsr/ret for its internal backtracking
(optimized for tailcalls) and uses Parrot calling conventions for
all interfaces with external callers/callees such as subrules.
PGE also uses Parrot coroutines for the matching
engine, so that after a successful match is found, the
next match within the same string can be found by simply
returning control to the matching coroutine, which then
picks up from where it had previously left off until
another match is discovered.
The code still needs a fair amount of commenting. In general,
if you have a question about a particular section of code,
send Pm an email and he'll write the comments for it.
=head1 AUTHOR
Patrick Michaud (pmichaud@pobox.com) is the author and maintainer.
Patches and suggestions should be sent to the Perl 6 compiler list
(perl6-compiler@perl.org).
=cut