TITLE
Parrot JIT Subsystem
VERSION
CURRENT
Maintainer: Daniel Grunblatt
Class: Internals
PDD Number: 8
Version: 1.3
Status: Developing
Last Modified: 26 Nov 2002
PDD Format: 1
Language:English
ABSTRACT
This PDD describes the Parrot Just In Time compilation subsystem.
DESCRIPTION
The Just In Time, or JIT, subsystem converts a bytecode file to native machine code instructions and executes the generated instruction sequence directly.
IMPLEMENTATION
Currently works on ALPHA, Arm, Intel x86, PPC, and SPARC version 8 processor systems, on most operating systems. Currently only 32-bit INTVALs are supported.
The initial step in generating native code is to invoke Parrot_jit_begin, which generally provides architecture specific preamble code. For each parrot opcode in the bytecode, either a generic or opcode specific sequence of native code is generated. The .jit files provide functions that generate native code for specific opcode functions, for a given instruction set architecture. If a function is not provided for a specific opcode, a generic sequence of native code is output which calls the interpreter C function that implements the opcode. Such opcode are handled by Parrot_jit_normal_op.
If the opcode can cause a control flow change, as in the case of a branch or call opcode, an extended or modified version of this generic code is used that tracks changes in the bytecode program counter with changes in the hardware program counter. This type of opcode is handled by Parrot_jit_cpcf_op.
While generating native code, certain offsets and absolute addresses may not be available. This occurs with forward opcode branches, as the native code corresponding to the branch target has not yet been generated. On some platforms, function calls are performed using program-counter relative addresses. Since the location of the buffer holding the native code may move as code is generated (due to growing of the buffer), these relative addresses may only be calculated once the buffer is guaranteed to no longer move. To handle these instances, the JIT subsystem uses "fixups", which record locations in native code where adjustments to the native code are required.
FILES
- jit/${jitcpuarch}/jit_emit.h
-
This file defines Parrot_jit_begin, Parrot_jit_dofixup, Parrot_jit_normal_op, Parrot_jit_cpcf_op, Parrot_jit_restart_op and optionally Parrot_jit_vtable*_op. In addition, this file defines the macros and static functions used in .jit files to produce binary representations of native instructions.
For moving registers from processor to parrot and vv, the Parrot_jit_emit_mov* functions have to be implemented.
- jit/${jitcpuarch}/core.jit
-
The functions to generate native code for core parrot opcodes are specified here. To simplify the maintenance of these functions, they are specified in a format that is pre-processed by jit2h.pl to produce a valid C source file, jit_cpu.c. See "Format of .jit Files" below.
- jit/${jitcpuarch}/string.jit
-
The string subsystem.
- include/parrot/jit.h
-
This file contains definitions of generic structures used by the JIT subsystem.
The op_jit array of jit_fn_info_t structures, provides for each opcode, a pointer to the function that generates native code for the opcode, whether the generic Parrot_jit_normal_op or Parrot_jit_cpcf_op functions or an opcode specific function. Parrot_jit_restart_op is like Parrot_jit_cpcf_op with the addition to check for a zero program counter. The Parrot_jit_vtable*_op functions are defined as Parrot_jit_normal_op or Parrot_jit_cpcf_op and may be implemeted to do native vtable calls (s. jit/i386/jit_emit.h for an example).
The Parrot_jit_fixup structure records the offset in native code where a fixup must be applied, the type of fixup required and the specific information needed to perform the parameters of the fixup. Currently, a fixup parameter is either an opcode_t value or a function pointer.
The Parrot_jit_info structure holds data used while producing and executing native code. An important piece of data in this structure is the op_map array, which maps from opcode addresses to native code addresses.
- jit.c
-
build_asm() is the main routine of the code generator, which loops over the parrot bytecode, calling the code generating routines for each opcode while filling in the op_map array. This array is used by the JIT subsystem to perform certain types of fixups on native code, as well as by the native code itself to convert bytecode program counters values (opcode_t *'s) to hardware program counter values.
The bytecode is considered an array of opcode_t sized elements, with parallel entries in op_map. op_map is initially populated with the offsets into the native code corresponding to the opcodes in the bytecode. Once code generation is complete and fixups have been applied, the native code offsets are converted to absolute addresses. This trades the low up-front cost of converting all offsets once, for the unknown cost of repeatedly converting these offsets while executing native code.
If the architecture defines INT_REGISTERS_TO_MAP and FLOAT_REGISTERS_TO_MAP as nonzero, this amount of most used registers per code section are mapped to native processor registers.
- jit2h.pl
-
Preprocesses the .jit files to produce jit_cpu.c.
Format of .jit Files
Jit files are interpreted as follows:
- op-name { \n body \n }
-
Where op-name is the name of the Parrot opcode, and body consists of C syntax code which may contain any of the identifiers listed in the following section.
The closing curly brace has to be in the first column.
- Comment lines
-
Comments are marked with a ; in the first column. These and empty lines are ignored.
- Identifiers
-
In general, prefixing an identifier with & yields the address of the The * prefix specifies a value. Since Parrot register values vary during code execution, their values can not be obtained through identifier substitution alone.
INT_REG[n]
Gets replaced by the
INTVAL
register specified in the nth argument.NUM_REG[n]
Gets replaced by the
FLOATVAL
register specified in the nth argument.STRING_REG[n]
Gets replaced by the
STRING
register specified in the nth argument.INT_CONST[n]
Gets replaced by the
INTVAL
constant specified in the nth argument.NUM_CONST[n]
Gets replaced by the
FLOATVAL
constant specified in the nth argument.MAP[n]
The nth integer or floating processor register, mapped in this section.
Note: The register with the physical number zero can not be mapped.
NATIVECODE
Gets replaced by the current native program counter.
*CUR_OPCODE[n]
Gets replaced by the address of the current opcode in the Parrot bytecode.
ISRn FSRn
The nth integer or floating point scratch register.
- TEMPLATE template-name { \n body \n }
-
Defines a template for similar functions, e.g. all the binary ops taking three variable parameters.
- template-name perl-subst ...
-
Take a template and do all substitutions to generate the implementation for this jit function.
Example:
TEMPLATE Parrot_set_x_ic { if (MAP[1]) { jit_emit_mov_ri<_N>(NATIVECODE, MAP[1], <typ>_CONST[2]); } else { jit_emit_mov_mi<_N>(NATIVECODE, &INT_REG[1], <typ>_CONST[2]); } } Parrot_set_i_ic { Parrot_set_x_ic s/<_N>/_i/ s/<typ>/*INT/ } Parrot_set_n_ic { Parrot_set_x_ic s/<_N>/_ni/ s/<typ>/&INT/ s/INT_R/NUM_R/ }
The jit function Parrot_set_i_ic is based on the template Parrot_set_x_ic, the s/x/y/ are substitutions on the template body, to generate the actual function body. These substitutions are done before the other substitutions.
s. jit/i386/core.jit for more.
Naming convention for jit_emit functions
To make it easier to share core.jit files between machines of similar architecture, the jit_emit functions should follow this syntax:
jit_emit_<op>_<args>_<type>
- <op>
-
This is the operation like mov, add or bxor. In normal cases this is the PASM name of the op.
- <args>
-
args specify the arguments of the function in the PASM sequence dest, source ... The args consist of one letter per argument:
- r
-
A mapped processor register.
- m
-
A memory operand, the address of the parrot register.
- i
-
An immediate operand, i.e. an integer constant.
- <type>
-
Specifies, if this operations works on integers or floating point arguments. If all arguments are of the same type, only one type specifier is needed.
- i
-
An integer argument
- n
-
A float argument.
Examples:
- jit_emit_sub_rm_i
-
Subtract integer at memory from integer processor register.
- jit_emit_mov_ri_ni
-
Move integer constant (immediate) to floating point register.
ALPHA Notes
The access to Parrot registers is done relative to $6
, all other memory access is done relative to $27
, to access float constants relative to $7
so you must preside the instruction with ldah $7,0($27).
i386 Notes
Only 32 bit INTVALs are supported. Long double FLOATVALs are ok.
There are four mapped integer registers %edi, %ebx, %esi and %edx. The first 3 of these are callee saved, they preserve their value around extern function calls. The register %ebp is abused to hold the address of the jit function table.
For floating point operations the registers ST1 ... ST4 are mapped.
EXAMPLE
Let's see how this work:
Parrot Assembly:
set I0,8
set I2,I0
print I2
end
Parrot Bytecode: (only the bytecode segment is shown)
+--------------------------------------+
| 73 | 0 | 8 | 72 | 2 | 0 | 21 | 0 | 0 |
+-|------------|------------|--------|-+
| | | |
| | | +----------- end (no arguments)
| | +-------------------- print_i (1 argument)
| +--------------------------------- set_i_i (2 arguments)
+---------------------------------------------- set_i_ic (2 arguments)
Please note that the opcode numbers used might have already changed.
Intel x86 assembly version of the Parrot ops:
Parrot_jit_begin
0x817ddd0 <jit_func>: push %ebp
0x817ddd1 <jit_func+1>: mov %esp,%ebp
0x817ddd3 <jit_func+3>: push %ebx
0x817ddd4 <jit_func+4>: push %esi
0x817ddd5 <jit_func+5>: push %edi
normal function header till here, now push interpreter
0x817ddd6 <jit_func+6>: push $0x8164420
get jit function table to %ebp and
jump to first instruction
0x817dddb <jit_func+11>: mov 0xc(%ebp),%eax
0x817ddde <jit_func+14>: mov $0x81773f0,%ebp
0x817dde3 <jit_func+19>: sub $0x81774a8,%eax
0x817dde9 <jit_func+25>: jmp *%ds:0x0(%ebp,%eax,1)
set_i_ic
0x817ddee <jit_func+30>: mov $0x8,%edi
set_i_i
0x817ddf3 <jit_func+35>: mov %edi,%ebx
Parrot_jit_save_registers
0x817ddf5 <jit_func+37>: mov %edi,0x8164420
0x817ddfb <jit_func+43>: mov %ebx,0x8164428
Parrot_jit_normal_op
0x817de01 <jit_func+49>: push $0x81774c0
0x817de06 <jit_func+54>: call 0x804be00 <Parrot_print_i>
0x817de0b <jit_func+59>: add $0x4,%esp
Parrot_jit_end
0x817de0e <jit_func+62>: add $0x4,%esp
0x817de14 <jit_func+68>: pop %edi
0x817de16 <jit_func+70>: pop %ebx
0x817de18 <jit_func+72>: pop %esi
0x817de1a <jit_func+74>: pop %ebp
0x817de1c <jit_func+76>: ret
Please note the reverse argument direction. PASM and JIT notation use dest,src,src, while gdb and the internal macros in jit_emit.h have src,dest.
Debugging
Above listing was generated by gdb, the GNU debugger, with a little help from Parrot_jit_debug, which generates a symbol file in stabs format, s. info stabs for more (or less :-()
The following script calls ddd (the graphic debugger fronted) and attaches the symbol file, after it got built in build_asm.
# dddp
# run ddd parrot with given file
# gdb confirmations should be off
echo "b runops_jit
r -d -j $1.pbc
n
n
n
n
add-symbol-file $1.o 0
s
" > .ddd
ddd --command .ddd parrot &
Run this with e.g. dddp t/op/jit_2, then turn on the register status, step or nexti through the source, or set break points as with any other language. Though - as we don't have line number info currently - you might delete empty lines and join labels with ops in the pasm file before.
You can examine parrot registers via the debugger or even set them and you can always step into external opcode and look at *interpreter.
The tests t/op/jit*.t have some test cases for testing register allocation. These tests are written for a mapping of 4 processor registers. If your processor architecture has more mapped registers, reduce them to 4 and run these tests.
Example for a debug session
$ cat j.pasm
set I0, 10
set N1, 1.1
set S2, "abc"
print "\n"
end
$ perl assemble.pl j.pasm > j.pbc
$ dddp j
(ddd shows above source code and assembly (startup code snipped):
0x815de46 <jit_func+30>: mov $0xa,%ebx
0x815de4b <jit_func+35>: fldl 0x81584c0
0x815de51 <jit_func+41>: fstp %st(2)
0x815de53 <jit_func+43>: mov %ebx,0x8158098
0x815de59 <jit_func+49>: fld %st(1)
0x815de5b <jit_func+51>: fstpl 0x8158120
0x815de61 <jit_func+57>: push $0x815cd90
0x815de66 <jit_func+62>: call 0x804db90 <Parrot_set_s_sc>
0x815de6b <jit_func+67>: add $0x4,%esp
0x815de6e <jit_func+70>: push $0x815cd9c
0x815de73 <jit_func+75>: call 0x804bcd0 <Parrot_print_sc>
0x815de78 <jit_func+80>: add $0x4,%esp
0x815de7b <jit_func+83>: add $0x4,%esp
0x815de81 <jit_func+89>: pop %edi
0x815de83 <jit_func+91>: pop %ebx
0x815de85 <jit_func+93>: pop %esi
0x815de87 <jit_func+95>: pop %ebp
0x815de89 <jit_func+97>: ret
(gdb) n
(gdb) n
(gdb) n
(gdb) p I0
$1 = 10
(gdb) p N1
$2 = 1.1000000000000001
(gdb) p *S2
$3 = {bufstart = 0x815ad30, buflen = 15, flags = 336128, bufused =
3, strstart = 0x815ad30 "abc"}
(gdb) p &I0
$4 = (INTVAL *) 0x8158098
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 280:
'=item' outside of any '=over'
- Around line 334:
You forgot a '=back' before '=head1'