Parrot Assembler
The Parrot Assembler's job is to take .pasm (Parrot Assembly) files and assemble them into Parrot bytecode. Plenty of references for Parrot assembly syntax already exist, so we won't go into details there. The assembler does its job by reading a .pasm file, extracting numeric and string constants from it, and reassembling the bits into bytecode.
The first pass goes through and expands constants, macros, and local labels. Syntax is described later on, in the 'Macro' section. The next pass goes through and collects the numeric and string constants along with the definition points and PCs of labels.
If you would like to view the text after the macro expansion pass, use the -E
flag. This flag simply tells the assembler to quit after the Macro
class does it thing.
The final pass replaces label occurrences with the appropriate PC offset and accumulates the (finally completely numeric) bytecode onto the output string. The XS portion takes the constants and bytecode, generates a header, tacks the constants and bytecode on, and finally prints out the string.
Macro
The Parrot assembler's macro layer has now been more-or-less defined, with one or two additions to come. The addition of the '.' preface will hopefully make things easier to parse, inasmuch as everything within an assembler file that needs to be expanded or processed by the macro engine will have a period ('.') prepended to it.
The macro layer implements constants, macros, and local labels. Including files will be done later on, but this handles most of the basic needs we have for macros.
To create a macro, the syntax is slightly different.
.macro swap (A,B,TEMP) # . marks the directive
set .TEMP,.A # . marks the special variable.
set .A,.B
set .B,.TEMP
.endm # And . marks the end of the macro.
Macros support labels that are local to a given macro expansion, and the syntax looks something like this:
.macro SpinForever (Count)
.local $LOOP: dec .COUNT # ".local $LOOP" defines a local label.
branch .$LOOP # Jump to said label.
.endm
Include this macro as many times as you like, and the branch statement should do the right thing every time. To use a global label, just as you usually do.
Constants are new, and the syntax looks like:
.constant PerlHash 6 # Again, . marks the directive
new P0, .PerlHash # . marks the special variable for expansion.
Several constants are predefined in the Macro class, but are not generated dynamically as they should be, at least not yet.
.constant Array 0
.constant PerlUndef 1
...
This should be generated from include/parrot/pmc.h, but my plans are to add a '.include' directive so we can '.include <constants.pmc>', and let pmc2c build the .pmc file at the same time as it builds pmc.h.
When the Assembler class is separated out, tests can use the Assembler class to accept a simple array of instructions and generate bytecode directly from that. This should eliminate the intermediary .pasm file and speed things up.
Keyed access
We now support the following (tested) code:
new P0, .PerlHash # (See the discussion of macros above)
set S0, "one"
set P0[S0],1
set I0,P0[S0]
print I0
print "\n"
end
Macro class
- new
-
Create a new Macro instance. Simply take the argument list and treat it as a list of files to concatenate and process. Files are taken in the order that they appear in the argument list.
- _expand_macro
-
Take a macro name and argument list, and expand the macro inline. Also, if the macro has embedded labels, expand these labels to local labels, and make certain that they're unique on a per-expansion basis. We do this with the
$self-
{macros}{$macro_name}{gensym}> value. - preprocess
-
Preprocesses constants, macros, include statements, and eventually conditional compilation.
.constant name {register} .constant name {signed_integer} .constant name {signed_float} .constant name {"string constant"} .constant name {'string constant'}
are removed from the array. Given the line:
'.constant HelloWorld "Hello, World!"'
one can expand HelloWorld via:
'print .HelloWorld' # Note the period to indicate a thing to expand.
Some predefined constants exist for your convenience, namely:
.Array .PerlHash .PerlArray
and the other PMC types. (This should be generated from include/parrot/pmc.h, but isn't at the moment.)
The contents of external files can be included by use of the
.include
macro:.include "{filename}"
The contents of the included file are inserted at the point where the
.include
macro occurs. This means that code like this:print "Hello " .include "foo.pasm" end
where foo.pasm contains:
print "World \n"
becomes:
print "Hello " print "World \n" end
Attempting to include a non-existent file is a non-fatal error.
.macro name ({arguments?}) ... .endm
Optional arguments are simply identifiers separated by commas. These arguments are matched to instances inside the macro named '.foo'. A simple example follows:
.macro inc3 (A,BLAM) inc .A # Mark the argument to expand with a '.'. inc .A inc .A print .BLAM .endm .inc3(I0) # Expands to the obvious ('inc I0\n') x 3
- contents
-
Access the
$self-
{contents}> internal array, where the post-processed data is stored.
Assembler class
- new
-
Create a new Assembler instance.
To compile a list of files: $compiler = Assembler->new(-files=>[qw(foo.pasm bar.pasm)]); To compile an array of instructions: $compiler = Assembler->new(-contents=>['set S0,"foo"','print S0','end']);
- _annotate_contents
-
Process the array
$self-
{contents}>, and make the appropriate annotations in the array. For instance, it slightly munges global and local labels to make sure the statements fall where they should. Also, annotates the array into an AoA of [$statement,$lineno]. A later pass changes $lineno to $pc, once the arguments have been appropriately analyzed. - _init
-
Process files of assembly code, should they have been passed in. Also, regardless of the input to
new()
, take the arrays of operators and load them into a form appropriate to parsing. - _collect_labels
-
Collect labels, remove their definition, and save the appropriate line numbers. Local labels aren't given special treatment yet.
- _generate_bytecode
-
Start out by walking the
$self-
{contents}> array. On the first pass, make sure that the operation requested exists. If it doesn't, yell on STDERR. If it does, replace the text version of the operator with its numeric index, and pack it into$self-
{bytecode}>.The inner loop walks through the arguments nested within the
$op
arrayref, determining what type the argument is ($_-
[0]>), and packing in the appropriate code. Note that labels are precalculated, and constants have been packed into the appropriate areas. - adjust_labels
-
This works primarily on
$self-
{global_labels}>, computing offsets and getting things ready for the final shift. Since the values of$self-
{global_labels}> correspond to line numbers, we replace the line numbers with program counter indices.The next pass walks the
$self-
{contents}> array, replacing the label names with the difference between the current PC and the label PC. Label names are preserved in the previous pass, which makes this possible. - _string_constant
-
Unescape special characters in the constant and add them to not one but two data structures.
$self-
{constants}{s}> is for fast lookup when time comes to substitute constants for their indices, and$self-
{ordered_constants}> keeps track of constants in order of occurrence, so they can be packed directly into the binary format. - _numeric_constant
-
Take the numeric constant and place it into both
$self-
{constants}{n}> and$self-
{ordered_constants}>. The first hash lets us do fast lookup when time comes to replace a constant with its value. The second array maintains the various constants in order of first occurrence, and is ready to pack into the bytecode. - _key_constant
-
Build a key constant and place it into both
$self-
{constants}{n}> and$self-
{ordered_constants}>. The first hash lets us do fast lookup when time comes to replace a constant with its value. The second array maintains the various constants in order of first occurrence, and is ready to pack into the bytecode. - constant_table
-
Constant table returns a hash with the length in bytes of the constant table and the constant table packed.
- output_bytecode
-
Returns a string with the Packfile.
First process the constants and generate the constant table to be able to make the packfile header, then return all.
- to_bytecode
-
Take the content array ref and turn it into a ragged AoAoA of operations with attached processed arguments. This is the core of the assembler.
The transformation looks roughly like this: [ [ 'if I0,BLAH', 3], [ 'set P1[S5],P0["foo"]', 5], [ 'BLAH: end', 6], ] into: [ [ [ 'if_i_ic', ['i','I0'], ['label','BLAH'], # Leave the name here so we can resolve backward refs. ], 3, # Line number ], [ [ 'set_p_s_p_sc', ['p','P1'], ['s','S5'], ['p','P0'], ['sc',0], # String constant number 0 ] 5, ], [ [ 'end', ], 6, ]
The first pass collects labels, so we can resolve forward label references (That is, labels used before they're defined). References to labels aren't yet expanded.
The second pass takes the arguments in each line (
$_-
[0]>) and breaks them into their components. It does this by passing each line through a loop of REs to break lines into each argument type. The individual REs break down the arguments into an array ref[$type,$argument]
. Constants are collected and replaced with indices, and the number of arguments is counted and added to the internal PC tracking.The third pass takes labels and replaces them with the PC offset to the actual instruction, and generates bytecode. It returns the bytecode, and we're done.
- process_args
-
Process the argument list and return the list of arguments and files to process. Only legal and sane arguments and files should get past this point.