RAKUDO COMPILER OVERVIEW
How the Rakudo Perl 6 compiler works
This document describes the architecture and operation of the Rakudo Perl 6 (or simply Rakudo) compiler. The README describes how to build and run Rakudo.
Rakudo has six main parts summarized below. Source code paths are relative to Rakudo's src/ directory, and platform specific filename extensions such as .exe are sometimes omitted for brevity.
Not Quite Perl builds Perl 6 source code parts into Rakudo
A main program drives parsing, code generation and runtime execution (Perl6/Compiler.nqp)
A grammar parses user programs (Perl6/Grammar.pm)
Action methods build a Parrot Abstract Syntax Tree (Perl6/Actions.pm)
Parrot extensions provide Perl 6 run time behavior (ops/perl6.ops, pmc/*.pmc, binder/*)
Libraries provide functions at run time (builtins/*.pir, cheats/*, core/*.pm, glue/*.pir, metamodel/*)
The Makefile (generated from ../tools/build/Makefile.in by ../Configure.pl) compiles all the parts to form the perl6.pbc executable and the perl6 or perl6.exe "fake executable". We call it fake because it has only a small stub of code to start the Parrot virtual machine, and passes itself as a chunk of bytecode for Parrot to execute. The source code of the "fakecutable" is generated as perl6.c with the stub at the very end. The entire contents of perl6.pbc are represented as escaped octal characters in one huge string called program_code
. What a hack!
1. NQP
The source files of Rakudo are preferably and increasingly written in Perl 6, the remainder in Parrot Intermediate Representation (PIR) or C. Not Quite Perl (nqp) provides the bootstrap step of compiling compiler code (yes!) written in a subset of Perl 6, into PIR.
The latest version of NQP includes the 6model library, which is the building block for all Perl 6 object. It also comes with a regex engine that Rakudo uses.
NQP is a bootstrapped compiler, it is mostly written in NQP. The source code of NQP is in a separate repository at http://github.com/perl6/nqp/. Note, NQPx only builds the Rakudo compiler, and does not compile or run user programs.
Stages
NQP compiles us a compiler in ../perl6.pbc and then ../perl6 or ../perl6. NQP also compiles the Meta model found in Perl6/Metamodel/. This is a library that controls how classes, methods, roles and so on work.
The bare-bones compiler then loads the compiled metamodel, and compiles the core files found in core/*.pm. Those core files provide the runtime library (like the Array
and Complex
classes). But note that many of these classes are also used when the final compiler processes your Perl 6 scripts.
2. Compiler main program
A subroutine called 'MAIN'
, in main.nqp, starts the source parsing and bytecode generation work. It creates a Perl6::Compiler
object for the 'perl6'
source type.
Before tracing Rakudo's execution further, a few words about Parrot process and library initialization.
Parrot execution does not simply begin with 'main'. When Parrot executes a bytecode file, it first calls all subroutines in it that are marked with the :init
modifier. Rakudo has over 50 such subroutines, brought in by .include
directives in Perl6/Compiler.pir, to create classes and objects in Parrot's memory.
Similarly, when the executable loads libraries, Parrot automatically calls subs having the :load
modifier. The Rakudo :init
subs are usually also :load
, so that the same startup sequence occurs whether Rakudo is run as an executable or loaded as a library.
So, that Rakudo 'main' subroutine had created a Perl6::Compiler
object. Next, 'main' invokes the 'command_line'
method on this object, passing the command line arguments in a PMC called args_str
. The 'command_line'
method is inherited from the HLLCompiler
parent class (part of the PCT, remember).
And that's it, apart from a '!fire_phasers'('END')
and an exit
. Well, as far a 'main'
is concerned. The remaining work is divided between PCT, grammar and actions.
3. Grammar
Using parrot-nqp
, make
target PERL6_G
uses parrot-nqp to compile Perl6/Grammar.pm to gen/perl6-grammar.pir.
The compiler works by calling TOP
method in Perl6/Grammar.pm. After some initialization, TOP matches the user program to the comp_unit (meaning compilation unit) token. That triggers a series of matches to other tokens and rules (two kinds of regex) depending on the source in the user program.
For example, here's the parse rule for Rakudo's unless
statement (in Perl6/Grammar.pm):
token statement_control:sym<unless> {
<sym> :s
<xblock>
[ <!before 'else'> ||
<.panic: 'unless does not take "else", please rewrite using "if"'>
]
}
This token says that an unless
statement consists of the word "unless" (captured into $<sym>
), and then an expression followed by a block. The .panic:
is a typical "Awesome" error message and the syntax is almost exactly the same as in STD.pm, described below.
Remember that for a match, not only must the <sym>
match the word unless
, the <xblock>
must also match the xblock
token. If you read more of Perl6/Grammar.pm, you will learn that xblock
in turn tries to match an <EXPR>
and a <pblock>
, which in turn tries to match .....
That is why this parsing algorithm is called Recursive Descent.
The top-level portion of the grammar is written using Perl 6 rules (Synopsis 5) and is based on the STD.pm grammar in the perl6/std
repository (https://github.com/perl6/std/). There are a few places where Rakudo's grammar deviates from STD.pm, but the ultimate goal is for the two to converge. Rakudo's grammar inherits from PCT's HLL::Grammar
, which provides the <.panic>
rule to throw exceptions for syntax errors.
4. Actions
The Perl6/Actions.pm file defines the code that the compiler generates when it matches each token or rule. The output is a tree hierarchy of objects representing language syntax elements, such as a statement. The tree is called a Parrot Abstract Syntax Tree (PAST).
The Perl6::Actions
class inherits from HLL::Actions
, another part of the Parrot Compiler Toolkit. Look in ../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir for several instances of .namespace ["HLL";"Actions"]
.
When the PCT calls the 'parse'
method on a grammar, it passes not only the program source code, but also a pointer to a parseactions class such as our compiled Perl6::Actions
. Then, each time the parser matches a named regex in the grammar, it automatically invokes the same named method in the actions class.
Back to the unless
example, here's the action method for the unless
statement (from Perl6/Actions.pm):
method statement_control:sym<unless>($/) {
my $past := xblock_immediate( $<xblock>.ast );
$past.pasttype('unless');
make $past;
}
When the parser invokes this action method, the current match object containing the parsed statement is passed into the method as $/
. In Perl 6, this means that the expression $<xblock>
refers to whatever the parser matched to the xblock
token. Similarly there are $<EXPR>
and $<pblock>
objects etc until the end of the recursive descent. By the way, $<xblock>
is Perl 6 syntactic sugar for $/{'xblock'}
.
The magic occurs in the $<xblock>.ast
and make
expressions in the method body. The .ast
method retrieves the PAST made already for the xblock
subtree. Thus $past
becomes a node object describing code to conditionally execute the block in the subtree.
The make
statement at the end of the method sets the newly created xblock_immediate
node as the PAST representation of the unless statement that was just parsed.
The Parrot Compiler Toolkit provides a wide variety of PAST node types for representing the various components of a HLL program -- for more details about the available node types, see PDD 26 ( http://docs.parrot.org/parrot/latest/html/docs/pdds/pdd26_ast.pod.html ). The PAST representation is the final stage of processing in Rakudo itself, and is given to Parrot directly. Parrot does the remainder of the work translating from PAST to PIR and then to bytecode.
5. Parrot extensions
Rakudo extends the Parrot virtual machine dynamically (i.e. at run time), adding 14 dynamic opcodes ("dynops") which are additional virtual machine code instructions, and 9 dynamic PMCs ("dynpmcs") (PolyMorphic Container, remember?) which are are Parrot's equivalent of class definitions.
The dynops source is in ops/perl6.ops, which looks like C, apart from some Perlish syntactic sugar. A ../parrot_install/bin/ops2c desugars that to build/perl6.c which your C compiler turns into a library.
For this overview, the opcode names and parameters might give a vague idea what they're about:
rakudo_dynop_setup()
rebless_subclass(in PMC, in PMC)
find_lex_skip_current(out PMC, in STR)
x_is_uprop(out INT, in STR, in STR, in INT)
get_next_candidate_info(out PMC, out PMC, out PMC)
transform_to_p6opaque(inout PMC)
deobjectref(out PMC, in PMC)
descalarref(out PMC, in PMC)
allocate_signature(out PMC, in INT)
get_signature_size(out INT, in PMC)
set_signature_elem(in PMC, in INT, in STR, in INT, inout PMC,
inout PMC, inout PMC, inout PMC, inout PMC, inout PMC, in STR)
get_signature_elem(in PMC, in INT, out STR, out INT, out PMC, out PMC,
out PMC, out PMC, out PMC, out PMC, out STR)
bind_signature(in PMC)
x_setprophash(in PMC, in PMC)
The dynamic PMCs are in pmc/*.pmc, one file per class. The language is again almost C, but with other sugary differences this time, for example definitions like group perl6_group
whose purpose will appear shortly. A ../parrot_install/lib/x.y.z-devel/tools/build/pmc2c.pl converts the sugar to something your C compiler understands.
For a rough idea what these classes are for, here are the names: P6Invocation P6LowLevelSig MutableVAR Perl6Scalar ObjectRef P6role Perl6MultiSub Perl6Str and P6Opaque.
Binder
The dynops and the dynpmcs call a utility routine called a signature binder, via a function pointer called bind_signature_func
. A binder matches parameters passed by callers of subs, methods and other code blocks, to the lexical names used internally. Parrot has a flexible set of calling conventions, but the Perl 6 permutations of arity, multiple dispatch, positional and named parameters, with constraints, defaults, flattening and slurping needs a higher level of operation. The answer lies in binder/bind.c which is compiled into perl6_ops
and perl6_group
libraries. Read http://use.perl.org/~JonathanWorthington/journal/39772 for a more detailed explanation of the binder.
Perl6/Compiler.pir has three .loadlib
commands early on. The perl6_group
loads the 9 PMCs, the perl6_ops
does the 14 dynops, and the math_ops
adds over 30 mathematical operators such as add
, sub
, mul
, div
, sin
, cos
, sqrt
, log10
etc. (source in parrot/src/ops/math.ops)
6. Builtin functions and runtime support
The last component of the compiler are the various builtin functions and libraries that a Perl 6 program expects to have available when it is running. These include functions for the basic operations (infix:<+>
, prefix:<abs>
) as well as common global functions such as say
and print
.
The stage-1 compiler compiles these all and they become part of the final perl6.pbc. The source code is in builtins/*.pir, cheats/*, core/*.pm, glue/*.pir and metamodel/*.
Still to be documented
* Rakudo PMCs
* The relationship between Parrot classes and Rakudo classes
* Protoobject implementation and basic class hierarchy
AUTHORS
Patrick Michaud <pmichaud@pobox.com> is the primary author and maintainer of Rakudo. The other contributors and named in CREDITS.
COPYRIGHT
Copyright (C) 2007-2010, The Perl Foundation.