Computer Architecture Lab/Winter2006/PolzerJahn/Assembler
The SISP Assembler
editThe Assembler for the SIS Processor was written in Python and is using the SPARK framework by John Aycock.
The assembling is done in 4 stages:
- Lexical analysis (scanning): Checks if the input format is valid and breaks the input stream into a list of tokens
- Syntax analysis (parsing): Checks if the list of tokens has valid syntax according to a grammar
- Preprocessing: Finds the addresses of labels and replaces targets
- Code generation: Generates bytecode
Usage
editTo assemble the file test.asm, use the following command:
python sispasm.py < test.asm > test.bin
Note: The assembler does not support comments or macros. If you use either of them (in C-style), use gcc with the command
gcc -x c -E -P test.asm > test.asm.ao
before invoking the assembler. From Version 0.4 a simple shell script is included in the archive which does both steps consecutively.
Syntax
editThe parser stage uses the following grammar to verify a valid syntax:
program ::= program instruction program ::= program label program ::= program command instruction ::= OP instruction ::= OP REG instruction ::= OP REG SEP REG instruction ::= OP REG SEP NUM instruction ::= OP NUM instruction ::= OP TGT label ::= LBL command ::= CMD command ::= CMD NUM
The assembler is case insensitive. There are three types of input: Instructions, Labels and commands.
Instructions
edit- OP [REG [, REG | IMM]]
Instructions represent the commands that get executed in our processor. An instruction may have zero, one or two operands separated by a comma. Operand one acts both as source and destination register. The second operand can be another source register or an immediate value. For a list of available instructions please refer to the instruction set. Some example instructions:
ADD r1, r2 LDC r2, text LD r1, r1 INP r4, 2 AND r4, r5 JZ loop
Labels
edit- LABELNAME:
Labels can be used at any point of the program to name an address. They can then be used in Instructions (e.g. JMP, LDI) instead of hard coding the addresses/values. In the preprocessing stage of the assembler every occurrence of a label gets replaced by its assigned address. Label names can consist of letters, numbers and underlines but have to start with a letter or underline. Some Examples:
TEXTLENGTH: MAIN_LOOP: ELSE_2:
Commands
edit- .COMMANDNAME [PARAM]
There are currently three commands which get processed by the assembler:
- .WORD <VALUE>
Inserts a 16-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.
- .DWORD <VALUE>
Inserts a 32-bit value at the position of this command. Values can be written in decimal or hexadecimal number format.
- .END
Marks the end of the program. No commands or instructions after this command will get processed.
Some examples for how you might use commands:
// The string "SISP!!" text: .word 0x53 .word 0x49 .word 0x53 .word 0x50 .word 0x21 .word 0x21 .word 0x0A .word 0x0D
Registers
editSISP provides 16 registers, 16 Bits wide. They are named R[0-15]. When our stack macros are used, R15 is reserved for the stack pointer.
Download
editYou can download the current version of our assembler as archive here.
Version history
edit0.7: 18-01-2007
- Fixed a bug which caused commands having no effect
0.6: 13-01-2007
- Fixed an error in the grammar that caused very long execution times
- Made script independent from callers cwd
- Made assembler case insensitive
- Allowed underline character in labels
- Added instructions (JMPR, LDIP)
0.5: 14-12-2006
- Fixed enumeration error
0.4: 2-11-2006
- Added operations ADC, SBB, NEG, ASL, ASR
- Changed JE, JNE to JZ, JNZ
- Added simple shell script for assembling
- Removed minor bugs
0.3: 27-10-2006
- labels can now be used as constants in LDC
- changed operation numbers
- added io operation type
- added check for multiple label occurrence
0.2: 26-10-2006
- added support for hexadecimal and octal numbers
- added instruction format in optype description (makes the assembler more general and flexible)
0.1: initial version