Computer Architecture Lab/Winter2006/PolzerJahn/Assembler

The SISP Assembler

edit

The Assembler for the SIS Processor was written in Python and is using the SPARK framework by John Aycock.

The assembling is done in 4 stages:

  1. Lexical analysis (scanning): Checks if the input format is valid and breaks the input stream into a list of tokens
  2. Syntax analysis (parsing): Checks if the list of tokens has valid syntax according to a grammar
  3. Preprocessing: Finds the addresses of labels and replaces targets
  4. Code generation: Generates bytecode

Usage

edit

To assemble the file test.asm, use the following command:

   python sispasm.py < test.asm > test.bin

Note: The assembler does not support comments or macros. If you use either of them (in C-style), use gcc with the command

   gcc -x c -E -P test.asm > test.asm.ao

before invoking the assembler. From Version 0.4 a simple shell script is included in the archive which does both steps consecutively.

Syntax

edit

The parser stage uses the following grammar to verify a valid syntax:

   program ::= program instruction
   program ::= program label
   program ::= program command
   instruction ::= OP
   instruction ::= OP REG
   instruction ::= OP REG SEP REG
   instruction ::= OP REG SEP NUM
   instruction ::= OP NUM
   instruction ::= OP TGT
   label ::= LBL
   command ::= CMD
   command ::= CMD NUM

The assembler is case insensitive. There are three types of input: Instructions, Labels and commands.

Instructions

edit
OP [REG [, REG | IMM]]

Instructions represent the commands that get executed in our processor. An instruction may have zero, one or two operands separated by a comma. Operand one acts both as source and destination register. The second operand can be another source register or an immediate value. For a list of available instructions please refer to the instruction set. Some example instructions:

   ADD r1, r2
   LDC r2, text
   LD  r1, r1
   INP r4, 2
   AND r4, r5
   JZ  loop

Labels

edit
LABELNAME:

Labels can be used at any point of the program to name an address. They can then be used in Instructions (e.g. JMP, LDI) instead of hard coding the addresses/values. In the preprocessing stage of the assembler every occurrence of a label gets replaced by its assigned address. Label names can consist of letters, numbers and underlines but have to start with a letter or underline. Some Examples:

   TEXTLENGTH:
   MAIN_LOOP:
   ELSE_2:

Commands

edit
.COMMANDNAME [PARAM]

There are currently three commands which get processed by the assembler:

  • .WORD <VALUE>

Inserts a 16-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.

  • .DWORD <VALUE>

Inserts a 32-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.

  • .END

Marks the end of the program. No commands or instructions after this command will get processed.

Some examples for how you might use commands:

   // The string "SISP!!"
   text: 
       .word 0x53
       .word 0x49
       .word 0x53
       .word 0x50
       .word 0x21
       .word 0x21
       .word 0x0A
       .word 0x0D

Registers

edit

SISP provides 16 registers, 16 Bits wide. They are named R[0-15]. When our stack macros are used, R15 is reserved for the stack pointer.

Download

edit

You can download the current version of our assembler as archive here.

Version history

edit

0.7: 18-01-2007

  • Fixed a bug which caused commands having no effect

0.6: 13-01-2007

  • Fixed an error in the grammar that caused very long execution times
  • Made script independent from callers cwd
  • Made assembler case insensitive
  • Allowed underline character in labels
  • Added instructions (JMPR, LDIP)

0.5: 14-12-2006

  • Fixed enumeration error

0.4: 2-11-2006

  • Added operations ADC, SBB, NEG, ASL, ASR
  • Changed JE, JNE to JZ, JNZ
  • Added simple shell script for assembling
  • Removed minor bugs

0.3: 27-10-2006

  • labels can now be used as constants in LDC
  • changed operation numbers
  • added io operation type
  • added check for multiple label occurrence

0.2: 26-10-2006

  • added support for hexadecimal and octal numbers
  • added instruction format in optype description (makes the assembler more general and flexible)

0.1: initial version