TITLE

Parrot - Memory Internals

VERSION

0.0.9 Dec 2002

ABSTRACT

This document tries to explain the internals of parrot structures related to memory management and should give the answer 42, when questions are related to the whole and everything and memory management.

OVERVIEW

All allocated items are managed in memory pools. Memory pools hold collections of similar items. These pools can be roughly divided into two kinds: pools holding fixed sized items and variable sized items.

A pool is organized as a linked list of big chunks, holding many managed items of one kind.

Abbreviations used here

DOD ... dead object detection, see docs/pdds/pdd09_gc.pod for details.

By scanning through all the interpreter's registers, stacks and the processor stack, all objects that are detected here are marked as alive. Objects in all pools not alive are considered dead.

Top down: the interpreter

A overall layout of the interpreter's memory management looks like so:

    typedef struct Parrot_Interp {
    	...
	struct Arenas *arena_base;
	...
    } Interp;

All object-like things that get allocated during the execution of parrot bytecode are managed here.

Arenas

struct Arenas holds pointers to different subkinds of managed memory. A simplification looks similar to this:

    struct Arenas {
	struct Memory_Pool *memory_pool;
	...
	struct Small_Object_Pool * header_pool;
    }

Memory_pool and header_pool are variable and fixed sized pool pointers respectively.

Memory_Pool

Here all variable sized items are allocated, managed in linked lists of Memory_Blocks - but see below.

Small_Object_Pool

This pool manages Small_Object_Arenas, linked together, which provide the space for the fixed sized objects.

Fixed sized items

These items are either objects by themselves, like a PMC, or are a header structure of a variable sized object like a STRING. The general object of this kind is a buffer-like object, which consists of a Buffer or a PObj at the beginning and a variable but fixed sized part after the buffer structure. Examples of this are a STRING or an IntList.

Buffer-like objects of the same size are maintained in the sized_header_pools, which manage objects of the same size in one slot.

All fixed sized objects are allocated with alloc_objects(), and first put onto the pool's free_list. When there is need for a new object it is taken off the free list, and when it stops being used, it will be put back on the free list again. DOD detects which objects are no longer being used.

If the free list is empty then a DOD run is started, which may be able to refill the free list with dead objects it detects are free for re-use. If it finds none then a new pool is allocated to hold more objects.

These fixed sized objects are never freed during the lifetime of an interpreter, they just get reused or recycled.

General structure of a buffer-like item

    struct parrot_object_t {
	struct {
	    void *bufstart;
	    size_t buflen;
	} b;
	unsigned flags;
	...
    } PObj;

This does not totally reflect the current implementation, but is the spirit of the abstraction of current objects. Above structure including flags is the current Buffer. With some additional fields like strstart and bufused, inserted at the ellipses, you get a STRING. Adding a vtable (and some other structure members) yields a PMC.

Variable sized items

These items never live alone, they are part of a Buffer structure, described above. They are allocated at bufstart. This is, for example, used to manage the buffer's free list, where bufstart is used as a pointer to the next object.

These items are managed in two different pools: the memory_pool and the constant_string_pool. The former holds all variable sized items, while the latter containing the word "string", holds constant strings only, as we don't have other variable sized constant items to store.

Here, different memory allocation schemes jump in:

Copying GC

A memory_pool gets allocated in big blocks, namely a Memory_Block.

When some part is needed, e.g. to store a string, this part is carved out of the memory block, until this block is used up. If no more space is available in this memory block, a garbage collection (GC) is started. This copies all living items of all used memory blocks into one new block, which holds thereafter only used items tightly packed together.

The old memory blocks, containing sparse unused parts and used parts already copied to the new place, are then unused altogether and get free()ed thereafter.

When GC doesn't provide enough free space needed for a new item, a new block is added to the memory pool.

This also implies that buffers are moved around during their life. Users of these buffers are therefore not allowed to hold pointers to buffers over pieces of code that might trigger a GC run, like Parrot_allocate() or string_compare().

Defragmentating allocator

An alternative to the above is to use a memory allocator, which is as fast as the above and does reuse free()ed items in a memory conserving way. Actually, the malloc implementations in some systems' libc efficiently provide this, such as the glibc malloc based on Doug Lea's allocator.

Using this allocator, all variable sized items are just allocated via a plain malloc() call, or resized with realloc(), and after they lose their existence (ie when DOD detects that the managing buffer header is no longer in use) they are free()ed. That's all. The underlying allocator collects these pieces, coalesces them if possible to form bigger pieces, and then puts them on free lists, sorted by size. Eventually, when a new allocation request arrives, it may give them back to Parrot.

So here, the variable length memory_pool is unused. You can consider this pool to live inside the allocator.

Buffers allocated this way don't move around, except when reallocated of course.

The Configure option --gc allows one to use either method.

Buffer_Tail and COW

Both implementations have the same problem to solve: STRINGs that get copied, or parts of strings as the result of a substr() call, do not allocate new space in the memory pool to hold this string or part of string. They just use a new_string_header() and set up a pointer (strstart) pointing somewhere into the original string and remember the used length there in bufused.

This is all OK, as long as the original and the lazy copy of the string are not modified. So that's well-known and called COW (copy on write).

Now, during GC (or freeing buffers) the problem arises: to whom does this string belong? You shouldn't copy the same string to different places, thus rendering COW to a noop, nor are you allowed to free this same string more then once (your debugger will tell you why...).

Both allocation schemes therefore use a part of the allocated string to do this bookkeeping.

Copying GC uses a Buffer_Tail after the end of the actual variable length string and marks such COW strings with TAIL_moved and stores the new address in the buffer header, so other users of this string can be updated to reuse the same string (XXX one or all other users?).

The malloc()/free() approach stores a refcount at bufstart. During DOD all dead users increment the refcount, living users set it to an huge value. When freeing the buffer, the string is only freed if the refcount reaches zero.

Simplified Figure

+--------+ +------------------<---| Arenas |<-----------+ | +--------+-->--+ | | | | | +------+ +-----------+ | +=============+ | | S0 |<---| Registers |<--)--| Interpreter | | +------+ +-----------+ | +=============+ | +---| S1 | | | | +------+ +----------+ +-------+ | | | Blk 1 |--)-->+----------+ +--------------+ +---------+ +-------+ | | Buffer 1 | | OnestringAno | | Block 1 | | Blk 2 | | +----------+ | therstring.. | +---------+ +-------+ | | Buffer 2 | | ..String... |<--| Block 2 | | . | | +----------+ +--------------+ +---------+ +-------+ | | ... | ^ ^ | ... | Small Obj | +----------+ | | +---------+ Pool +-->| Buffer N |--------+----+ Memory Pool +----------+ Buffer Memory Block

Now consider, that the Interpreter shall be a PMC living in Buffer X of the underlying interpreter and is currently running perldoc docs/memory_internals.pod, and then redo this figure, with all the blanks filled in ;-)

FILES

smallobject.[ch], headers.c, resources.[ch], res_lea.c, dod.c, string.[ch], pobj.h.

BUGS

All spelling errors belong to those who honestly collected them, as well as all errors related to my abuse of the English language - I'm not natively speaking it.

To minimize this bugs section - please patch this document and keep it up to date - thanks.

AUTHOR

Leopold Tötsch <lt@toetsch.at>

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 201:

'=item' outside of any '=over'

Around line 228:

You forgot a '=back' before '=head1'

Around line 244:

Non-ASCII character seen before =encoding in 'Tötsch'. Assuming CP1252