NAME

PAST Building Blocks - A catalogue of PAST nodes and how to use them.

DESCRIPTION

This document describes the PIR output for each type of Parrot Abstract Syntax Tree (PAST) node, and all available attributes.

PAST::Node

PAST::Node is the base class of all other PAST classes.

Attributes

:node

Sets this node's source and pos attributes. These indicate which part of the source code generated this node. The node attribute can be set using another node or a Match object.

The :node attribute is needed for annotating a source line. If a source line is represented by several nodes, i.e. a subtree of the whole AST, then only the root of that subtree needs to have the :node attribute.

Methods

See "pdd26_ast.pod" in docs.

PAST::Block

Synopsis

PAST::Block.new()

Without any attributes, a PAST::Block translates to the following:

.namespace []
.sub "_block10"
    .return ()
.end

Typical use

A block represents a lexical scope. The most common use for a block is subroutines (or functions or procedures or whatever name Your Language gives to invokable objects). Usually, nested blocks define a new scope as well, which can also be represented by a block. A typical example is the common if statement (the example here uses Lua syntax):

if <expr> then
  <block1>
else
  <block2>
end

A variable in block1 is (depending on the language) usually not visible in block2. If the statements in the body of the if statement are not in a different scope, use a PAST::Stmts node instead.

Attributes

:namespace

Sets the namespace of the block to the specified value.

:namespace('foo')

will result in a directive:

.namespace ["foo"]

{{XXX how to specify nested namespaces? what kind of string array?}}

:blocktype

Available options:

'declaration'

Default value. The block is created, but not invoked. This is typically used for creating functions.

'immediate'

The block is invoked immediately. When the :blocktype attribute is set to this value, the generated PIR becomes:

.namespace []
.sub "anon"
    get_global $P11, "_block10"
    newclosure $P11, $P11
    $P11()
.end

.namespace []
.sub "_block10"
    .return ()
.end

The sub anon is the main function, which is the entry point of the whole program. It looks for the sub that we declared, creates a new closure from that block (more on that later), and invokes that closure.

:name

Gives a name to the block. The generated PIR subroutine is named by the value of this attribute. Example:

PAST::Block.new( :name('foo') )

translates to:

.sub "foo"
    .return ()
.end

Symbol tables

As a block defines a new scope, a PAST::Block object has a symbol table, which can be used to store and look up symbols. See PDD26 for more details. A PAST::Block node has a method symbol used to both enter and find symbols.

The symbol method takes the symbol name as an argument, and optional attributes. If the symbol does not exist in the table, a new hash is created for that symbol. Any additional attribute arguments are set into this new hash. The symbol method returns the current entry for the specified name. Checking for a symbol in a symbol table is as easy as:

our $?BLOCK; # the current scope

if $?BLOCK.symbol('foo') {
  # symbol 'foo' was found
}
else {
  # symbol 'foo' was not found
}

{{
XXX Review this; this should refer to PAST compiler's symbol() method.

If a symbol is not found in a block's symbol table, any outer blocks (outer scopes) will be inspected. The symbol table entry of the innermost scope is returned. For instance, consider the following Perl 6 code:

sub foo {
  my $i = 42;
  sub bar {
    sub baz {
      say("baz: $i");
    }
  }
}

In the subroutine baz, the variable $i is referenced. It is not found in the symbol table of baz, so it is looked for in baz' outer scope, bar. It's not found there either, so bar's outer scope, foo is inspected, where it is found.

XXX
}}

PAST::Stmts

Synopsis

PAST::Stmts.new()

Typical use

The common use of a PAST::Stmts node is to group a set of statements. Most programming languages have a grammar rule that similar to this:

rule statements {
    [<statement> ';']*
    {*}
}

In order to move around the PAST nodes representing these individual statements, it is useful to group them and store them in one node. This can be done by storing all PAST nodes in a PAST::Stmts node. Note that PAST does not define a statement; any PAST node can be considered a statement, including a PAST node as simple as PAST::Val.

PAST::Var

Synopsis

PAST::Var.new( :name('foo'), :scope('lexical') )

results in:

find_lex $P10, "foo"

Typical use

Nodes of type PAST::Var are used to represent variables and their declarations (based on the :isdecl flag, see below). Whereever a variable is used in the source language, this can be represented by a PAST::Var node.

Attributes

:name (required)

Sets the name of the variable. This attribute is required.

:scope

Set the scope of this PAST::Var node. If the :scope is not set, it is inherited from the symbol table entry in a PAST::Block's symbol table. If the scope is not set, the scope defaults to lexical.

Available values:

'lexical'

When the :scope attribute is set to lexical, then the identifier specified in the :name attribute is handled as a lexical:

find_lex $P10, "foo"
'package'

Defines the variable as a package variable. This is handled by the get_global and set_global ops.

'parameter'

Defines the variable as a parameter (but is stored as a lexical).

.param pmc param_10
.lex "foo", param_10
'keyed'
my $idx := PAST::Var.new( :name('foo'), :scope('package') );
my $agg := PAST::Var.new( :name('bar'), :scope('package') );
make PAST::Var.new( $agg, $idx, :scope('keyed') );

results in:

get_global $P10, "foo"    # get index, a package-scoped variable "foo"
get_global $P11, "bar"    # get aggregate, a package-scoped variable "bar"
set $P12, $P11[$P10]      # get the contents at bar[foo]
'attribute'

Defines the variable as an attribute of an object. The first child of the PAST::Var node evaluates to the object from which the attribute is requested. If there is no such child, the object defaults to self, meaning the current invocant.

make PAST::Var.new( PAST::Var.new( :name('foo'), :scope('package') ),
                    :name('bar'),
                    :scope('attribute')
                  );

translates to:

get_global $P10, "foo"
get_attribute $P11, $P10, "bar"

Note: currently, no child nodes are evaluated, so you can only get attributes on self for now.

:isdecl(INT)

If this flag is set, the variable represented by this PAST::Var node is declared at this point. This flag is cleared by default.

.lex "foo", $P10

:viviself

When this attribute is set, the variable is initialized with the value represented by the PAST subtree given as this attribute's argument. Adding this attribute will result in the following generated code:

  <code for PAST::Var, leaving the variable in $P10>
  unless_null $P10, vivify_11
  <evaluate the child PAST node, leaving result in $P12>
  assign $P10, $P12
vivify_11:

:vivibase

Similar to :viviself, but the value of this attribute yields the initialization code for an aggregate object. See pdd26_ast.pod for more details.

PAST::Val

Synopsis

PAST::Val.new( :value(42) )

results in:

new $P10, "Integer"
assign $P10, 42

Typical use

All literal values in your source language can be represented by PAST::Val nodes.

Attributes

:value (required)

Specifies the literal value that is represented by this PAST::Val node.

:returns

Specifies the type of the value. If this is not specified, then some defaults are used. For integer values, this is Integer, for quoted strings this is String; for floating point values, this is Float.

PAST::VarList

Synopsis

PAST::VarList.new()

Typical use

Used to group a number of PAST::Var nodes.

PAST::Op

Synopsis

PAST::Op.new( :pasttype('if') )

Typical use

PAST::Op nodes are used to represent common operations, such as an if statement, a sub call. They can also be used to generate custom PIR instructions, using the :inline attribute.

Attributes

:pasttype

if

Requires at least 1 child node, which is evaluated as the conditional expression.

   <evaluate 1st child, result stored in $P11>
   if $P11, if_10
   <else part (3rd child); optional>
   goto if_10_end
if_10:
   <then part (2nd child); optional>
if_10_end:
unless

Same as if, except that the if op is replaced by the unless op.

while

Requires 2 children; the first is the loop condition, the second is the loop body.

while_10:
   <evaluate 1st child, result stored in $P11>
   unless $P11, while_10_end
   <evaluate 2nd child>
   goto while_10
while_10_end:
until

Same as while, except the loop is executed while the condition evaluates to false.

call (default)
PAST::Op.new( :name('foo'), :pasttype('call') );

results in:

"foo"()

while

my $fun := PAST::Var.new( :name('foo'), :scope('package'));
PAST::Op.new( $fun, :pasttype('call') );

generates:

get_global $P10, "foo"
$P10()

Children of a :pasttype('call') node are evaluated and passed as arguments. If the node does not receive a :name attribute, then the first child is evaluated as the subroutine to be invoked.

callmethod
my $invocant := PAST::Var.new( :name('foo'), :scope('package') );
PAST::Op.new( $invocant,
              :name('bar'),
              :pasttype('callmethod')
             );

generates:

get_global $P10, "foo"
$P10."bar"()
bind

Binds the variable represented by the first child to the value represented by the second child.

my $lhs := PAST::Var.new( :name('foo'), :scope('package') );
my $rhs := PAST::Val.new( :value(42) );
make PAST::Op.new($lhs, $rhs, :pasttype('bind') );

results in:

new $P10, "Integer"       # code for evaluating $rhs
assign $P10, 42
set_global "foo", $P10    # code for the :pasttype('bind') op

when the scope is set to lexical, the last line becomes:

store_lex "foo", $P10

:inline

If this attribute is specified, :pasttype is implicitly set to the value of inline. The specified string is emitted in the code generator. The string may contain special fields: %n where n is an integer value between (0,9); %r, %t and %u. See the PDD for details.

Example:

my $var := PAST::Var.new( :name('foo'), :scope('lexical') );
PAST::Op.new( $var, :inline('    %r = %0') );

is transformed to:

find_lex $P10, "foo" # generated for $var
$P11 = $P10          # inline '%r = %0'

TIPS AND TRICKS

Once you have experience in using PAST nodes, generating code for Your Favorite Language becomes rather straightforward. However, it is sometimes tricky to get started. Therefore, this section presents some tips 'n' tricks to get you started.

Refactor Grammar Rules

Scenario 1

Sometimes it is useful to refactor the grammar of your language in order to make code generation somewhat easier or to make the action method easier. Consider the following example.

rule primary_expr {
    [ <prefix> | <functioncall> ] <expression>
    {*}
}

method primary_expr($/) {
    my $past;
    if $<prefix> {
        $past := $( $<prefix> );
    }
    else {
        $past := $( $<functioncall> );
    }
    my $expr := $( $<expression> );

    # do something with $past and $expr
    # ...
}

while this solution is straightforward, the code in the action method contains a conditional statement. The more branches you have in your code, the more you need to think when you re-read this in 6 months time. An alternative solution would be this:

rule primary_expr {
    <prefix_expr> <expression>
    {*}
}

rule prefix_expr {
    | <prefix> {*}        #= prefix
    | <functioncall> {*}  #= functioncall
}

method primary_expr($/) {
    my $past := $( $<prefix_expr> );
    my $expr := $( $<expression> );

    # do something with $past and $expr
    # ...
}

method prefix_expr($/, $key) {
    make $( $/{$key} );
}

While you have to write a bit more code, this code is more straightforward. While there might be a small cost in function call overhead, there is no longer the conditional statement, which itself is more efficient.

Scenario 2

Consider a language that uses an index notation to indicate fields (attributes), like so:

foo["bar"] = 42

this language also has some syntactic sugar for this, using a dot notation:

foo.bar = 42

This can be expressed in the following grammar rules:

rule target {
    | <ident> <index>?
    | ...
}

rule index {
    | '.' <ident> {*}      #= ident
    | '[' <quote> ']' {*}  #= quote
}

A naive implementation could look like this:

method target($/) {
    my $name := $( $<ident> );

    if $<index> {
        my $idx := $( $<index>[0] );
        # do something with $idx
    }
    # ...
}

method index($/, $key) {
    my $indexexpr := $( $/{$key} );

    # if $indexexpr is an identifier, stringify it
    if $key eq 'ident' {
       $indexexpr := PAST::Val.new( :returns('String'), :value($indexexpr) );
    }
    make $indexexpr;
}

Somewhere you have to check the type of index. Not only does this result in more complex code (more conditional statements), it is less efficient, as an extra PAST node must be created.

A more elegant solution is to refactor the grammar slightly, like so:

rule index {
    | <dot_field> {*}     #= dot_field
    | '[' <quote> ']' {*} #= quote
}

rule dot_field {
    '.' <ident>
    {*}
}

method index($/, $key) {
    make $( $/{$key} );
}

method dot_field($/) {
    my $field := $( $<ident> );
    make PAST.Val.new( :returns('String'), :value($field) );
}

There is no more conditional code, it's just a matter of computations; based on the type of index, do the right thing automatically.

Create PAST nodes deep in the parse tree

Consider a grammar fragment such as this:

rule function_def {
    'function' <ident> '(' <parameters>? ')' <block> 'end'
    {*}
}

rule parameters {
    <ident> [',' <ident>]*
    {*}
}

You could write the action methods for these rules as follows:

method function_def($/) {
   my $past := PAST::Block.new( :node($/) );

   if $<parameters> {
       # get the PAST::VarList that contains all parameters
       my $params := $( $<parameters>[0] );

       # put all of them into the PAST::Block node
       for @($params) {
           $past.push($_);
       }
   }

   $past.name( $( $<ident> ) );
   $past.push( $( $<block> ) );
   make $past;
}

method parameters($/) {
   my $past := PAST::VarList.new( :node($/) );
   for $<ident> {
       my $param := $($_);
       $param.scope('parameter');
       $past.push( $param );
   }
   make $past;
}

While this solution works well, this is suboptimal. In the action method for <parameters>, a PAST::VarList node is created, which is only used to move around the parameter identifiers, and then discarded. An alternative solution would be to create the PAST::Block node that represents the function in the action method for <parameters>. This makes perfect sense, as the only place where the parameters should live is in that function block. Then, in the action method for function_def, this PAST::Block node is retrieved and decorated with other values (such as a function name, for instance) and the PAST node for the function body. Only if there are no parameters should the action method for function_def create a PAST::Block node.

The result could look like this:

method function_def($/) {
   my $past;
   if $<parameters> {
       $past := $( $<parameters>[0] );
   }
   else { # no parameters, create the function block here
       $past := PAST::Block.new( :node($/) );
   }
   $past.name( $( $<ident> ).name() );
   $past.push( $( $<block> ) );
}

method parameters($/) {
   my $past := PAST::Block.new( :node($/) );
   for $<ident> {
       my $param := $($_);
       $param.scope('parameter');
       $past.push( $param );
   }
   make $past;
}

A further refactor could result in even simpler code:

rule function_def {
    'function' <ident> '(' <parameters> ')' <block> 'end'
    {*}
}

rule parameters {
    [ <ident> [',' <ident>]* ]?
}

method function_def($/) {
   my $past := $( $<parameters> );
   $past.name( $( $<ident> ).name() );
   $past.push( $( $<block> ) );
   make $past;
}

method parameters($/) {
   my $past := PAST::Block.new( :node($/) );
   for $<ident> {
       my $param := $( $<ident> );
       $param.scope('parameter');
       $past.push($param);
   }
   make $past;
}

Note that the rule parameters is changed slightly. For indexing the ident nodes, this makes no difference, as they already lived in an array. Making all of them optional doesn't change the way they are stored in the parse tree.

The same principle can be applied in several scenarios. More tips will be added later.

Steal from Perl 6

You are implementing a language and you want to find out which PAST nodes you should generate for your language construct. Often Perl 6 has the same language construct. This means that you can write a quick example script in Perl 6 and look at the PAST generated by Rakudo (the Perl 6 implementation for Parrot):

./perl6 --target=past t.pl

SEE ALSO

  • docs/pdds/pdd26_ast.pod

BUGS AND IMPROVEMENTS

Bug reports and improvements on this document may be sent to parrotbug@parrotcode.org.