NAME

CPU::x86_64::InstructionWriter - Assemble x86-64 instructions using a pure-perl API

VERSION

version 0.001

SYNOPSIS

# POSIX::exit(42);
my $machine_code= CPU::x86_64::InstructionWriter->new
  ->mov( 'RAX', 60 )
  ->mov( 'RDI', 42 )
  ->syscall()
  ->bytes;

# if (x == 1) { ++x } else { ++y }
my $machine_code= CPU::x86_64::InstructionWriter->new
  ->cmp( 'RAX', 1 )
  ->jne('else')        # jump to not-yet-defined label named 'else'
  ->inc( 'RAX' )
  ->jmp('end')         # jump to another not-yet-defined label
  ->label('else')      # resolve previous jump to this address
  ->inc( 'RCX' )
  ->label('end')       # resolve second jump to this address
  ->bytes;

DESCRIPTION

This module is an early stage of development and the API is not finalized.

The purpose of this module is to relatively efficiently assemble instructions for the x86-64 without generating and re-parsing assembly language, or shelling out to an external tool. All instructions are assumed to be for the 64-bit mode of the processor. Functionality for real mode or segmented 16-bit mode could be added by a yet-to-be-written ::x86 module.

This module consists of a bunch of chainable methods which build a string of machine code as you call them. It supports lazy-resolved jump labels, and lazy-bound constants which can be assigned a value after the instructions have been assembled.

Note: This module currently requires a perl with 64-bit integers and pack('Q') support.

NOTATIONS

The method names of this class loosely match the NASM notation, but with the addition of the number of data bits following the opcode name, and list of arguments.

    MOV EAX, [EBX]
    
    $w->mov32_reg_mem('eax', ['ebx']);
    
    # or, short form
	use CPU::X86_64::InstructionWriter ':registers';
    $w->mov(eax,[ebx]);

Using a specific method like 'mov32_reg_mem' runs faster than the generic method 'mov', and removes ambiguity since your code generator probably already knows what operation it wants. Also it removes the need for the "qword" attributes that NASM sometimes needs. However, if you want you can use the generic method for an op.

There are often entirely new names given to an opcode (for the somewhat obscure ones) but the official Intel/AMD name is provided as an alias.

CMP EAX EBX
JNO label           ; quick, what does JNO mean?

$w->cmp32_reg_reg('eax','ebx')->jmp_unless_overflow($label);
# or:
$w->cmp(eax,ebx)->jno("mylabel");

MEMORY LOCATIONS

Most instructions in the x86 set allow for one argument to be a memory location, composed of

A base register
plus a constant displacement (usually limited to 32-bit)
plus an index register times a scale of 1, 2, 4, or 8
[ $base, $displacement, $index, $scale ]

Leave a slot in the array undef to skip it. (but obviously one of them must be set) You may also allocate a smaller array to imply the remeaining items are undef.

Examples:

['rdx']                       # address RDX
['rbx', -20000]               # address RBX-20000
[undef, 0x7FFFFFFF]           # address 0x7FFFFFFF
[undef, undef, 'ecx', 8]      # address ECX*8

NASM supports scales like [EAX*5] by silently converting that to [EAX+EAX*4], but this module does not support that via the scale field. (it would just slow things down for a feature nobody uses)

ATTRIBUTES

start_address

You might or might not need to set this. Some instructions care about what address they live at for things like RIP-relative addressing. The default value is an object of class "unknown". Things that depend on it will also be represented by "unknown" until the start_address has been given a value. If you try to resolve them numerically before start_address is set, you get an exception.

labels

This is a set of all labels currently relevant to this writer, indexed by name (so names must be unique). You probably don't need to access this. See "get_label" and "mark".

METHODS

get_label

my $label= $writer->get_label($name); # label-by-name, created on demand
my $label= $writer->get_label();      # new anonymous label

Return a label object for the given name, or if no name is given, return an anonymous label.

The label objects returned can be assigned a location within the instruction stream using "mark" and used as the target for JMP and JMP-like instructions. A label can also be used as a constant once all variable-length instructions have been "resolve"d and once "start_address" is defined.

label

->label($label_ref)     # bind label object to current position
->label(my $new_label)  # like above, but create anonymous label object and assign to $new_label
->label($label_name)    # like above, but create/lookup label object by name

Bind a named label to the current position in the instruction buffer. You can also pass a label reference from "get_label", or an undef variable which will be assigned a label.

If the current position follows instructions of unknown length, the label will be processed as an unknown, and shift automatically as the instructions are resolved.

bytes

Return the assembled instructions as a string of bytes. This will fail if any of the labels were left un-marked or if any expressions can't be evaluated.

DATA DECLARATION

This class assembles instructions, but sometimes you want to mix in data, and label the data. These methods append data, optionally aligned.

data

Append a string of literal bytes to the instruction stream.

data_i8, data_i16, data_i32, data_i64

Pack an integer into some number of bits and append it.

data_f32, data_f64

Pack a floating point number into the given bit-length (float or double) and append it.

align, align16, align32, align64, align128

Append zero or more bytes so that the next instruction is aligned in memory. By default, the fill-byte will be a NO-OP (0x90). You can override it with your choice.

INSTRUCTIONS

The following methods append an instruction to the buffer, and return $self so you can continue calling instructions in a chain.

NOP, PAUSE

Insert one or more no-op instructions.

nop(), nop( $n )

If called without an argument, insert one no-op. Else insert $n no-ops.

pause(), pause( $n )

Like NOP, but hints to the processor that the program is in a spin-loop so it has the opportunity to reduce power consumption. This is a 2-byte instruction.

CALL

call_label( $label )

Call to subroutine at named label, relative to current RIP. This method takes a label and calculates a call_rel( $ofs ) for you.

call_rel( $offset )

Call to subroutine at signed 32-bit offset from current RIP.

call_abs_reg( $reg )

Call to subroutine at absolute address stored in 64-bit register.

call_abs_mem( \@mem )

Call to subroutine at absolute address stored at "memory location"

RET

->ret
->ret($pop_bytes) # 16-bit number of bytes to discard from stack

JMP

All jump instructions are relative, and take either a numeric offset (from the start of the next instruction) or a label, except the jmp_abs_reg instruction which takes a register containing the target address, and the jmp_abs_mem which reads a memory address for the address to jump to.

If you pass an undefined variable as a label it will be auto-populated with a label object. Otherwise the label should be a string (label name) or label object obtained from "get_label".

jmp($label)

Unconditional jump to label (or 32-bit offset constant).

jmp_abs_reg($reg)

Jump to the absolute address contained in a register.

jmp_abs_mem(\@mem)

Jump to the absolute address read from a "memory location"

jmp_if_eq, je, jz
jmp_if_ne, jne, jnz

Jump to label if zero flag is/isn't set after CMP instruction

jmp_if_unsigned_lt, jb, jmp_if_carry, jc
jmp_if_unsigned_gt, ja
jmp_if_unsigned_le, jbe
jmp_if_unsigned_ge, jae, jmp_unless_carry, jnc

Jump to label if unsigned less-than / greater-than / less-or-equal / greater-or-equal

jmp_if_signed_lt, jl
jmp_if_signed_gt, jg
jmp_if_signed_le, jle
jmp_if_signed_ge, jge

Jump to label if signed less-than / greater-than / less-or-equal / greater-or-equal

jmp_if_sign, js
jmp_unless_sign, jns

Jump to label if 'sign' flag is/isn't set after CMP instruction

jmp_if_overflow, jo
jmp_unless_overflow, jno

Jump to label if overflow flag is/isn't set after CMP instruction

jmp_if_parity_even, jpe, jp
jmp_if_parity_odd, jpo, jnp

Jump to label if 'parity' flag is/isn't set after CMP instruction

jmp_cx_zero, jrcxz

Short-jump to label if RCX register is zero

loop

Decrement RCX and short-jump to label if RCX register is nonzero (decrement of RCX does not change rFLAGS)

loopz, loope

Decrement RCX and short-jump to label if RCX register is nonzero and zero flag (ZF) is set. (decrement of RCX does not change rFLAGS)

loopnz, loopne

Decrement RCX and short-jump to label if RCX register is nonzero and zero flag (ZF) is not set (decrement of RCX does not change rFLAGS)

MOV

mov($dest, $src, $bits)

Generic top-level instruction method that dispatches to more specific versions of mov based on the arguments you gave it. The third argument is optional if one of the other arguments is a register.

mov64_reg_reg($dest_reg, $src_reg)

Copy second register to first register. Copies full 64-bit value.

mov##_mem_reg($mem, $reg)

Store ##-bit value in register to a "memory location". If the memory location consists of a single displacement greater than 32 bits, the register must be the appropriate size accumulator (RAX, EAX, AX, or AL)

mov##_reg_mem($reg, $mem)

Load ##-bit value at "memory location" into register. The Displacement portion of the memory location must normally be 32-bit, but as a special case you can load a full 64-bit displacement (with no register offset) into the Accumulator register of that size (RAX, EAX, AX, or AL).

$asm->mov8_reg_mem ( 'al', [ undef, 0xFF00FF00FF00FF00FF00 ]);
$asm->mov64_reg_mem('rax', [ undef, 0xFF00FF00FF00FF00FF00 ]);
mov64_reg_imm($dest_reg, $constant)

Load a constant value into a 64-bit register. Constant is sign-extended to 64-bits. Constant may be an expression.

mov##_mem_imm($mem, $constant)

Store a constant value into a ##-bit memory location. For mov64, constant is 32-bit sign-extended to 64-bits. Constant may be an expression.

CMOV

TODO...

LEA

lea($reg, $src, $bits)

Dispatch to a variant of LEA based on argument types.

lea16_reg_mem($reg16, \@mem) =item lea32_reg_mem($reg32, \@mem) =item lea64_reg_mem($reg64, \@mem) =item lea16_reg_reg($reg16, $reg64) =item lea32_reg_reg($reg32, $reg64) =item lea64_reg_reg($reg64, $reg64)

Load the address of the 64-bit value stored at "memory location". It is essentially a shorthand for two memory load operations where the first is loading a pointer and the second is loading the value it points to.

ADD, ADC

The add## variants are the plain ADD instruction, for each bit width. The addcarry## variants are the ADC instruction that also adds the carry flag, useful for multi-word addition.

add($dst, $src, $bits)
add##_reg_reg($dest, $src)
add##_reg_mem($reg, \@mem)
add##_mem_reg(\@mem, $reg)
add##_reg_imm($reg, $const)
add##_mem_imm(\@mem, $const)

Returns $self, for chaining.

addcarry($dst, $src, $bits), adc($dst, $src, $bits)
addcarry##_reg(reg64, reg64)
addcarry##_mem(reg64, base_reg64, displacement, index_reg64, scale)
addcarry##_to_mem(reg64, base_reg64, displacement, index_reg64, scale)
addcarry##_const(reg64, const)
addcarry##_const_to_mem(const, base_reg64, displacement, index_reg64, scale)

Returns $self, for chaining.

sub

add##_reg_imm($reg, $const)

AND

and($dst, $src, $bits)
and##_reg_reg($dest, $src)
and##_reg_mem($reg, \@mem)
and##_mem_reg(\@mem, $reg)
and##_reg_imm($reg, $const)
and##_mem_imm(\@mem, $const)

OR

or($dst, $src, $bits)
or##_reg(reg64, reg64)
or##_mem(reg64, base_reg64, displacement, index_reg64, scale)
or##_to_mem(reg64, base_reg64, displacement, index_reg64, scale)
or##_const(reg64, const)
or##_const_to_mem(const, base_reg64, displacement, index_reg64, scale)

XOR

xor($dst, $src, $bits)
xor##_reg(reg64, reg64)
xor##_mem(reg64, base_reg64, displacement, index_reg64, scale)
xor##_to_mem(reg64, base_reg64, displacement, index_reg64, scale)
xor##_const(reg64, const)
xor##_const_to_mem(const, base_reg64, displacement, index_reg64, scale)

SHL

Shift left by a constant or the CL register. The shift is at most 63 bits for 64-bit register, or 31 bits otherwise.

shl($dst, $src, $bits)
shl##_reg_imm( $reg, $const )
shl##_mem_imm( \@mem, $const )
shl##_reg_cl( $reg )
shl##_mem_cl( \@mem )

SHR

Shift right by a constant or the CL register. The shift is at most 63 bits for 64-bit register, or 31 bits otherwise.

shr($dst, $src, $bits)
shr##_reg_imm( $reg, $const )
shr##_mem_imm( \@mem, $const )
shr##_reg_cl( $reg, 'cl' // undef )
shr##_mem_cl( \@mem, 'cl' // undef )

SAR

Shift "arithmetic" right by a constant or the CL register, and sign-extend the left-most bits. The shift is at most 63 bits for 64-bit register, or 31 bits otherwise.

sar($dst, $src, $bits)
sar##_reg_imm( $reg, $const )
sar##_mem_imm( \@mem, $const )
sar##_reg_cl( $reg, 'cl' // undef )
sar##_mem_cl( \@mem, 'cl' // undef )

BSWAP

Swap byte order on 32 or 64 bits.

bswap64
bswap32
bswap16

(This is actually the XCHG instruction)

CMP

Like SUB, but don't modify any arguments, just update RFLAGS.

cmp($dst, $src, $bits)
cmp##_reg_reg($dest, $src)
cmp##_reg_mem($reg, \@mem)

Subtract mem (second args) from reg (first arg)

cmp##_mem_reg(\@mem, $reg);

Subtract reg (first arg) from mem (second args)

cmp##_reg_imm($reg, $const)

Subtract const from reg

cmp##_mem_imm(\@mem, $const)

Subtract const from contents of mem address

TEST

Like AND, but don't modify any arguments, just update flags. Note that order of arguments does not matter, and there is no "to_mem" variant.

test($dst, $src, $bits)
test##_reg_reg($dest, $src)
test##_reg_mem($reg, \@mem)
test##_reg_imm($reg, $const)
test##_mem_imm(\@mem, $const)

DEC

dec($operand, $bits)
dec##_reg($reg)
dec##_mem(\@mem)

INC

inc($operand, $bits)
inc##_reg($reg)
inc##_mem(\@mem)

NOT

Flip all bits in a target register or memory location.

notNN_reg($reg)
notNN_mem(\@mem)

NEG

Replace target register or memory location with signed negation (2's complement).

neg##_reg($reg)
neg##_mem(\@mem)

DIV, IDIV

div##_reg($reg)

Unsigned divide of _DX:_AX by a NN-bit register. (divides AX into AL,AH for 8-bit)

div##_mem(\@mem)

Unsigned divide of _DX:_AX by a NN-bit memory value referenced by 64-bit registers

div##_reg($reg)

Signed divide of _DX:_AX by a NN-bit register. (divides AX into AL,AH for 8-bit)

div##_mem(\@mem)

Signed divide of _DX:_AX by a NN-bit memory value referenced by 64-bit registers

MUL

mul64_dxax_reg
mul32_dxax_reg
mul16_dxax_reg
mul8_ax_reg

sign extend

Various special-purpose sign extension instructions, mostly used to set up for DIV

sign_extend_al_ax, cbw
sign_extend_ax_eax, cwde
sign_extend_eax_rax, cdqe
sign_extend_ax_dx, cwd
sign_extend_eax_edx, cdq
sign_extend_rax_rdx, cqo

flag modifiers

Each flag modifier takes an argument of 0 (clear), 1 (set), or -1 (invert).

flag_carry($state), clc, cmc, stc

PUSH

This only implements the 64-bit push instruction.

push($operand, $bits)
push64_reg
push64_imm
push64_mem

POP

pop($operand, $bits)
pop_reg
pop_mem

ENTER

->enter( $bytes_for_vars, $nesting_level )

bytes_for_vars is an unsigned 16-bit, and nesting_level is a value 0..31 (byte masked to 5 bits)

Both constants may be expressions.

LEAVE

Un-do an ENTER instruction.

syscall

Syscall instruction, takes no arguments. (params are stored in pre-defined registers)

STRING INSTRUCTIONS

->xor('RAX','RAX')      # Compare to 0
->mov('RCX', 42)        # Count
->mov('RDI', \@memaddr) # String
->std                   # Iterate to increasing address
->repne->scas8;         # Iterate until [RDI] == "\0" or 42 bytes

rep

Repeat RCX times (used with "ins", "lods", "movs", "outs", "stos")

repe, repz

Repeat RCX times or until zero-flag becomes zero. (used with "cmps", "scas")

repne, repnz

Repeat RCX times or until zero-flag becomes one. (used with "cmps", "scas")

flag_direction($bool_set)

Set (1) or clear (0) the direction flag.

std

Set the direction flag (iterate to higher address)

cld

Clear the direction flag (iterate to lower address)

movsNN

movs64, movsq
movs32, movsd
movs16, movsw
movs8, movsb

cmpsNN

cmps64, cmpsq
cmps32, cmpsd
cmps16, cmpsw
cmps8, cmpsb

scasNN

scas64, scasq
scas32, scasd
scas16, scasw
scas8, scasb

SYNCHRONIZATION INSTRUCTIONS

These special-purpose instructions relate to strict ordering of memory operations, cache flushing, or atomic operations useful for implementing semaphores.

compare_exchangeNN, cmpxchg

compare_exchange64
compare_exchange32
compare_exchange16
compare_exchange8

TODO

mfence, lfence, sfence

Parameterless instructions for memory access serialization. Forces memory operations before the fence to compete before memory operations after the fence. Lfence affects load operations, sfence affects store operations, and mfence affects both.

ENCODING x86_64 INSTRUCTIONS

The AMD64 Architecture Programmer's Manual is a somewhat tedious read, so here are my notes:

Typical 2-arg 64-bit instruction: REX ( AddrSize ) Opcode ModRM ( ScaleIndexBase ( Disp ) ) ( Immed )

	REX: use extended registers and/or 64-bit operand sizes.
		Not used for simple push/pop or handful of others
	REX = 0x40 + (W:1bit R:1bit X:1bit B:1bit)
		REX.W = "wide" (64-bit operand size when set)
		REX.R is 4th bit of ModRM.Reg
		REX.X is 4th bit of SIB.Index
		REX.B is 4th bit of ModRM.R/M or of SIB.Base or of ModRM.Reg depending on goofy rules
  
	ModRM: mode/registers flags
	ModRM = (Mod:2bit Reg:3bit R/M:3bit)
		ModRM.Mod indicates operands:
			11b means ( Reg, R/M-reg-value )
			00b means ( Reg, R/M-reg-addr ) unless second reg is SP/BP/R12/R13
			01b means ( Reg, R/M-reg-addr + 8-bit disp ) unless second reg is SP/R12
			10b means ( Reg, R/M-reg-addr + 32-bit disp ) unless second reg is SP/R12
			
			When accessing mem, R/M=100b means include the SIB byte for exotic addressing options
			In the 00b case, R/M=101b means use instruction pointer + 32-bit immed

	SIB: optional byte for wild and crazy memory addressing; activate with ModRM.R/M = 0100b
	SIB = (Scale:2bit Index:3bit Base:3bit)
		address is (index_register << scale) + base_register (+immed per the ModRM.Mod bits)
		* unless index_register = 0100b then no register is used.
			(i.e. RSP cannot be used as an index register )
		* unless base_register = _101b and ModRM.mod = 00 then no register is used.
			(i.e. [R{BP,13} + R?? * 2] must be written as [R{BP,13} + R?? * 2 + 0]

The methods that perform the encoding are not public, but are documented in the source for anyone who wants to extend this module to handle additional instructions.

AUTHOR

Michael Conrad <mike@nrdvana.net>

COPYRIGHT AND LICENSE

This software is copyright (c) 2023 by Michael Conrad.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.