NAME

Disassemble::X86::FormatTree - Format machine instructions as a tree

SYNOPSIS

use Disassemble::X86;
$d = Disassemble::X86->new(format => "Tree");

DESCRIPTION

This module returns Intel x86 machine instructions as a tree structure, which is suitable for further processing.

The tree consists of hashrefs. There are three common keys, though only op is required:

op

The operation being performed.

size

The size of the result of the operation, in bits.

arg

The arguments being operated on, in a listref. Each argument is represented by its own hashref.

Top-level nodes may also contain the following keys:

start

The starting address of the instruction.

len

The length of the instruction, in bytes.

proc

The minimum processor model required, as described in Disassemble::X86.

prefix

Set to 1 if this node is an opcode prefix such as rep or lock.

The op field commonly contains an opcode mnemonic. However, other values may appear.

reg

A machine register.

lit

A literal numeric value.

mem

A reference to memory.

seg

A segment prefix.

The argument list for a register contains the register name followed by its type. Register types include dword and word for general-purpose registers, seg for segment registers, and fp for floating-point registers. If the register is really part of a larger register, that register's name appears as a third arg.

That's quite a bit to digest all at once. Here is a simple example:

mov eax,0x1
    becomes
{op=>"mov", arg=>[
    {op=>"reg", size=>32, arg=>["eax", "dword"]},
    {op=>"lit", size=>32, arg=>[0x1]}
], start=>1234, len=>5, proc=>386}

That's fairly straightforward. Here's something a bit more involved.

add byte[di+0x4],al
    becomes
{op=>"add", arg=>[
    {op=>"mem", size=>8, arg=>[
        {op=>"+", size=>16, arg=> [
            {op=>"reg", size=>16, arg=>["di", "word", "edi"]},
            {op=>"lit", size=>16, arg=>[0x4]}
        ]}
    ]}
    {op=>"reg", size=>8, arg=>["al", "lobyte", "eax"]}
], start=>5678, len=>3, proc=>86}

Notice that the details of the address calculation are encapsulated within the + node. The address is 16 bits long, but the value fetched from memory is only 8 bits. This distinction is captured cleanly.

Yes, this is fairly complicated to work with. If you don't need all this complexity, try the FormatText module instead.

METHODS

format_instr

$tree = Disassemble::X86::Tree->format_instr($tree);

The format subroutine is a no-op. It returns exactly the same input it is given.

SEE ALSO

Disassemble::X86

Disassemble::X86::FormatText

AUTHOR

Bob Mathews <bobmathews@alumni.calpoly.edu>

COPYRIGHT

Copyright (c) 2002 Bob Mathews. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.