Computer Architecture Lab/Winter2006/PolzerJahn/Assembler

The SISP Assembler

The Assembler for the SIS Processor was written in Python and is using the SPARK framework by John Aycock.

The assembling is done in 4 stages:

Lexical analysis (scanning): Checks if the input format is valid and breaks the input stream into a list of tokens
Syntax analysis (parsing): Checks if the list of tokens has valid syntax according to a grammar
Preprocessing: Finds the addresses of labels and replaces targets
Code generation: Generates bytecode

Usage

To assemble the file test.asm, use the following command:

   python sispasm.py < test.asm > test.bin

Note: The assembler does not support comments or macros. If you use either of them (in C-style), use gcc with the command

   gcc -x c -E -P test.asm > test.asm.ao

before invoking the assembler. From Version 0.4 a simple shell script is included in the archive which does both steps consecutively.

Syntax

The parser stage uses the following grammar to verify a valid syntax:

   program ::= program instruction
   program ::= program label
   program ::= program command
   instruction ::= OP
   instruction ::= OP REG
   instruction ::= OP REG SEP REG
   instruction ::= OP REG SEP NUM
   instruction ::= OP NUM
   instruction ::= OP TGT
   label ::= LBL
   command ::= CMD
   command ::= CMD NUM

The assembler is case insensitive. There are three types of input: Instructions, Labels and commands.

Instructions

OP [REG [, REG | IMM]]

Instructions represent the commands that get executed in our processor. An instruction may have zero, one or two operands separated by a comma. Operand one acts both as source and destination register. The second operand can be another source register or an immediate value. For a list of available instructions please refer to the instruction set. Some example instructions:

   ADD r1, r2
   LDC r2, text
   LD  r1, r1
   INP r4, 2
   AND r4, r5
   JZ  loop

Labels

LABELNAME:

Labels can be used at any point of the program to name an address. They can then be used in Instructions (e.g. JMP, LDI) instead of hard coding the addresses/values. In the preprocessing stage of the assembler every occurrence of a label gets replaced by its assigned address. Label names can consist of letters, numbers and underlines but have to start with a letter or underline. Some Examples:

   TEXTLENGTH:
   MAIN_LOOP:
   ELSE_2:

Commands

.COMMANDNAME [PARAM]

There are currently three commands which get processed by the assembler:

.WORD <VALUE>

Inserts a 16-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.

.DWORD <VALUE>

Inserts a 32-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.

.END

Marks the end of the program. No commands or instructions after this command will get processed.

Some examples for how you might use commands:

   // The string "SISP!!"
   text: 
       .word 0x53
       .word 0x49
       .word 0x53
       .word 0x50
       .word 0x21
       .word 0x21
       .word 0x0A
       .word 0x0D

Registers

SISP provides 16 registers, 16 Bits wide. They are named R[0-15]. When our stack macros are used, R15 is reserved for the stack pointer.

Download

You can download the current version of our assembler as archive here.

Version history

0.7: 18-01-2007

Fixed a bug which caused commands having no effect

0.6: 13-01-2007

Fixed an error in the grammar that caused very long execution times
Made script independent from callers cwd
Made assembler case insensitive
Allowed underline character in labels
Added instructions (JMPR, LDIP)

0.5: 14-12-2006

Fixed enumeration error

0.4: 2-11-2006

Added operations ADC, SBB, NEG, ASL, ASR
Changed JE, JNE to JZ, JNZ
Added simple shell script for assembling
Removed minor bugs

0.3: 27-10-2006

labels can now be used as constants in LDC
changed operation numbers
added io operation type
added check for multiple label occurrence

0.2: 26-10-2006

added support for hexadecimal and octal numbers
added instruction format in optype description (makes the assembler more general and flexible)

0.1: initial version