Computer Architecture Lab/Winter2006/PolzerJahn/Assembler
The SISP AssemblerEdit
The Assembler for the SIS Processor was written in Python and is using the SPARK framework by John Aycock.
The assembling is done in 4 stages:
- Lexical analysis (scanning): Checks if the input format is valid and breaks the input stream into a list of tokens
- Syntax analysis (parsing): Checks if the list of tokens has valid syntax according to a grammar
- Preprocessing: Finds the addresses of labels and replaces targets
- Code generation: Generates bytecode
To assemble the file test.asm, use the following command:
python sispasm.py < test.asm > test.bin
Note: The assembler does not support comments or macros. If you use either of them (in C-style), use gcc with the command
gcc -x c -E -P test.asm > test.asm.ao
before invoking the assembler. From Version 0.4 a simple shell script is included in the archive which does both steps consecutively.
The parser stage uses the following grammar to verify a valid syntax:
program ::= program instruction program ::= program label program ::= program command instruction ::= OP instruction ::= OP REG instruction ::= OP REG SEP REG instruction ::= OP REG SEP NUM instruction ::= OP NUM instruction ::= OP TGT label ::= LBL command ::= CMD command ::= CMD NUM
The assembler is case insensitive. There are three types of input: Instructions, Labels and commands.
- OP [REG [, REG | IMM]]
Instructions represent the commands that get executed in our processor. An instruction may have zero, one or two operands separated by a comma. Operand one acts both as source and destination register. The second operand can be another source register or an immediate value. For a list of available instructions please refer to the instruction set. Some example instructions:
ADD r1, r2 LDC r2, text LD r1, r1 INP r4, 2 AND r4, r5 JZ loop
Labels can be used at any point of the program to name an address. They can then be used in Instructions (e.g. JMP, LDI) instead of hard coding the addresses/values. In the preprocessing stage of the assembler every occurrence of a label gets replaced by its assigned address. Label names can consist of letters, numbers and underlines but have to start with a letter or underline. Some Examples:
TEXTLENGTH: MAIN_LOOP: ELSE_2:
- .COMMANDNAME [PARAM]
There are currently three commands which get processed by the assembler:
- .WORD <VALUE>
Inserts a 16-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.
- .DWORD <VALUE>
Inserts a 32-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.
Marks the end of the program. No commands or instructions after this command will get processed.
Some examples for how you might use commands:
// The string "SISP!!" text: .word 0x53 .word 0x49 .word 0x53 .word 0x50 .word 0x21 .word 0x21 .word 0x0A .word 0x0D
SISP provides 16 registers, 16 Bits wide. They are named R[0-15]. When our stack macros are used, R15 is reserved for the stack pointer.
You can download the current version of our assembler as archive here.
- Fixed a bug which caused commands having no effect
- Fixed an error in the grammar that caused very long execution times
- Made script independent from callers cwd
- Made assembler case insensitive
- Allowed underline character in labels
- Added instructions (JMPR, LDIP)
- Fixed enumeration error
- Added operations ADC, SBB, NEG, ASL, ASR
- Changed JE, JNE to JZ, JNZ
- Added simple shell script for assembling
- Removed minor bugs
- labels can now be used as constants in LDC
- changed operation numbers
- added io operation type
- added check for multiple label occurrence
- added support for hexadecimal and octal numbers
- added instruction format in optype description (makes the assembler more general and flexible)
0.1: initial version