Zero assembler programming language

Test

A minimal assembler and emulator for the Zero assembler programming language .

Open the Actions tab to see the code in action on Ubuntu and Windows Services for Linux for Ubuntu on windows.

Installation

Install Zero assembler programming language by downloading this repo and then following the steps shown in this validating action

Application

Includes an implementation of N-Way trees using code written in the Zero assembler programming language: assiduously optimized through exhaustive testing, ready for realization in Silicon as an integrated circuit rather than as software on a conventional CPU so that large, extremely fast, associative memories can be manufactured on an industrial scale.

The initial idea is to produce a small CPU which implements just the instructions needed to implement the algorithm. The small CPU will then be replicated across an fpga so that the tree can be queried in parallel.

Only one tree will be used: typically mapping 64 bit keys into 64 bit data. It will be useful to add additional data at the front of the keys such as data length, data position, process id, userid, time stamp etc. As the keys are sorted in the tree, trees with similar prefixes will tend to collect together so we can compress out the common prefix of the keys in each node to make better use of memory. Strings longer than 64 bits can be processed in pieces with each piece prefixed by the string id and the position in the string. Incoming strings can be made unique by assigning a unique 64 bit number to each prefix of the string so that a second such string can be easily recognized. Once such a long string has been represented by a unique number it can be located in one descent through the tree; although a single traversal of the tree will no longer yield such strings in alphabetic order.

All communications with the chip will be done via TcpIp . Incoming read requests can be done in parallel as long as there are processors left to assign work to. An update will have to wait for all existing finds to finish while stalling all trailing actions until the update is complete.

Associative lookups are the sine qua non of all Turing complete programming languages. This arrangement should produce very fast associative lookups - much faster than can be performed by any generic system reliant on external software. Usage of power and integrated circuit surface area should be reduced by having a minimal CPU to perform the lookups. Being able to deliver such lookups faster than can be done with conventional software solutions might prove profitable in much the same way as graphics chips and other chips used at scale.

Memory is addressed via named areas which act as flexible arrays with the usual indexing, push, pop, index, iteration, resizing and scan operations. Each procedure has its own stack frame implemented as a stack frame area, parameter area and return results area. Each area can grow as much as is needed to hold data. Additional user memory areas can be allocated and freed as necessary. Communication with other systems can be achieved by reading and writing to arrays with predetermined names.

Well known locations are represented by character == non numeric area ids. Stack frames, parameters and return areas are represented by negative area ids.

References can represent constants via a scalar with zero levels of dereferencing; direct addresses by scalars with one level of dereferencing, and indirect addresses by scalars with two levels of dereferencing. A reference consisting of an area, an offset within an area and an area name are represented as an array reference with three entries. A reference to a location in the current stack frame is represented as a single scalar with the appropriate levels of dereferencing. The area name is used to confirm that the area being processed is the one that should be being processed.

If you would like to be involved with this interesting and potentially lucrative project, please raise an issue saying so!

Emulator:

The emulator converts a Perl representation of the assembly source code to executable instructions and then executes these instructions.

Documentation Code

Addresses

Each assembler instruction can potentially affect a target memory location specified by the target operand known as the left hand address. Each assembler instruction can potentially read zero, one or two source operands to locate the data to be processed by the instruction.

Each address indexes an array in memory. Each array has a non unique name to confirm that we are reading or writing to the right kind of memory.

Left hand addresses

Left hand address in current stack frame

  Mov \1, 2

The above instruction moves the constant 2 to the location in the current stack frame identified by location 1 in the current stack frame.

Left hand addresses as indexed arrays

  [Array, address, name, delta]

  Mov [1, 2, 'array name'], 99

A left hand address can specifys the address of a location in an array in memory. Left hand addresses always occur first in the written specification of an instruction. In the example above, the value 99 is being moved to location 2 in array 1 operating under the name of 'array name'.

If the array number is preceded by \ as in \1 then the number of the array will be retrieved from location 1 the current stack frame. This mechanism allows for indirect addressing of array names.

Likewise the index of the location in the array can either be specified as a direct number as in 2 or indirectly as \2.

The name of the array is used to check that we are accessing the expected type of array. If the name does not match the expected name for the array being accessed an error message will be written to the out channel and the execution of the program will be terminated.

An address can also be specified as just as n meaning at location n in the current stack frame, or \n indicating an indirect location in the current stack frame.

Right hand addresses

Right hand addresses as constants

  [Array, address, name]

  Mov 3, 99

The example above moves the right hand constant 99 to the location 3 in the current stack frame.

Right hand addresses as variables

  Mov 3, \4

The example above moves the contents of location 4 in the current stack frame to location 3 in the current stack frame.

Right hand addresses as indexed arrays

  [Array, address, name]

  my $a = Array "keys";
  Mov [$a, 3, 'keys'], \4

The example above moves the contents of location 4 in the current stack frame to location 3 in the array whose identifying number is located at location $a in the current stack frame. The array is created with an identifying name of keys. The string keys must be presented on each subsequent access to this array to confirm that the correct type of memory is being accessed.

Instructions

The instruction set

Macro Preprocessor

Every assembler needs a macro preocessor to generate code from macro specifications as writing each instruction by hand is hard work. Using a preprocessor saves programmer time by allowing common instruction sequences to be captured as macros which can then be called upon as needed to generate the code for an application. The Zero assembler programming language uses Perl as its macro preprocessor. Using Perl as the macro preprocessor for Zero assembler programming language enables macro libraries to be published and distributed on CPAN as Perl modules.

Machine code

The Zero assembler programming language code can be converted to a single string using GenerateMachineCode. The string of machine code can then be reloaded using disAssemble to parse the machine code string back into code that can be executed as usual via Execute. Alternatively, the machine code string can be used to program a Silicon device such as an fpga to execute the code on a chip .

Examples

Examples of Zero assembler programming language programs:

Hello World

"Hello World" in the Zero assembler programming language:

  Start 1;

  Out "Hello World";

  my $e = Execute(suppressOutput=>1);

  is_deeply $e->out, <<END;
Hello World
END

Explanation

Start starts a program using a specified version of the language.

Out writes a message to the out channel.

Execute causes the program to be assembled and then executed. The execution results are stored in the Perl data structure returned by this instruction.

N-Way-Tree

An implementation of N-Way-Trees in Zero assembler programming language .

Documentation Code

Can you reduce the number of instructions required to perform 107 inserts into an N-Way-Tree? Please raise an issue if you can stating your terms for your enhancememt.

  add               => 159,
  array             => 247,
  arrayCountGreater => 2,
  arrayCountLess    => 262,
  arrayIndex        => 293,
  dec               => 30,
  inc               => 726,
  jEq               => 894,
  jGe               => 648,
  jLe               => 461,
  jLt               => 565,
  jmp               => 878,
  jNe               => 908,
  mov               => 7619,
  moveLong          => 171,
  not               => 631,
  resize            => 161,
  shiftUp           => 300,
  subtract          => 501,

Sort programs

The examples folder contains some sort programs written in Zero assembler programming language . The total number of instructions executed for each sort program on each of two sample sets of data are shown below. Various prototype solutions were developed for each sort program: the one with the lowest emulated instruction count was retained as the optimal solution.

Short Long

bubble 245 4754

insertion 189 3788

quick 285 1434

selection 286 4357

For documentation see: CPAN