Computer Architecture Lab/Winter2006/KammPuffBili/InstructionSet
General
edit- 16 16-bit registers
- flags: Z, C, V, N
- Z: all bits of the last result are zero
- C: 17^th bit of the last result
- N: 16^th bit of the last result
- V: overflow, after sub/cmp it is , the latter two according to the result
- I: allow interrupts
- P: parity of the last result
- any register as return address
- Some parts come from the Alpha architecture. "ldpgm" had to be there, because those were painfully missing from the SPEAR Architecture. The handling of branches is inspired by the Intel x86.
- separate registers for interrupt vectors - read and written through "ldvec" / "stvec"
The processor uses a Harvard architecture; although it has not prevailed in mainstream-architectures, it is still used in embedded processors such as the Atmel AVR. The separation of code- and data-memory is not flexible enough for mainstream systems, but with small embedded processors the program code tends to be fixed anyway. A Harvard architecture enables the processor to make use of more memory (which is an issue when the adress space is limited to 64k), and the program code can be read from a ROM directly. A transient failure thus cannot destroy the program by overwriting its code section.
Instruction Set
editadd r1, r2, r3 | 0000 | r1 + r2 → r3 |
sub r1, r2, r3 | 0001 | r1 - r2 → r3 |
addc r1, r2, r3 | 0010 | r1 + r2 + C → r3 |
subb r1, r2, r3 | 0011 | r1 - r2 - C → r3 |
and r1, r2, r3 | 0100 | r1 \wedge r2 → r3 |
or r1, r2, r3 | 0101 | r1 \vee r2 → r3 |
xor r1, r2, r3 | 0110 | r1 \oplus r2 → r3 |
mul r1, r2, r3 | 0111 | r1 * r2 → r3 |
div r1, r2, r3 | 1000 | r1 \div r2 → r3 |
udiv r1, r2, r3 | 1001 | r1 \div r2 → r3, \textnormal{unsigned} |
ldil n8, r1 | 1010 | (r1 \wedge \texttt{0xff00}) \vee n8 → r1, -128 \leq n8 \leq 255 |
ldih n8, r1 | 1011 | (r1 \wedge \texttt{0x00ff}) \vee (n8 \shl 8) → r1, -128 \leq n8 \leq 255 |
ldib n8, r1 | 1100 | n8 → r1, -128 \leq n8 \leq 127 |
mov r1, r2 | 11010000 | r1 → r2 |
mod r1, r2 | 11010001 | r1\ \textnormal{mod}\ r2 → r1 |
umod r1, r2 | 11010010 | r1\ \textnormal{mod}\ r2 → r1, \textnormal{unsigned} |
not r1, r2 | 11010011 | \lnot r1 → r2 |
neg r1, r2 | 11010100 | -r1 → r2 |
cmp r1, r2 | 11010101 | r1 - r2, \textnormal{sets flags} |
addi r1, n4 | 11010110 | r1 + n4 → r1, -8 \leq n4 \leq 7 |
cmpi r1, n4 | 11010111 | r1 - n4, \textnormal{sets flags}, -8 \leq n4 \leq 7 |
shl r1, r2 | 11011000 | r1 \shl r2 → r1 |
shr r1, r2 | 11011001 | r1 \shr r2 → r1 |
sar r1, r2 | 11011010 | r1 \sar r2 → r1 |
rolc r1, r2 | 11011011 | (r1 \shl r2) \vee (C \shl (r2-1)) \vee (r1 \shr (16-r2-1)) |
rorc r1, r2 | 11011100 | (r1 \shr r2) \vee (C \shl (16-r2)) \vee (r1 \shl (16-r2-1)) |
bset r1, n4 | 11011101 | r1 \vee (1 \shl n4) → r1, 0 \leq n4 \leq 15 |
bclr r1, n4 | 11011110 | r1 \wedge \lnot (1 \shl n4) → r1, 0 \leq n4 \leq 15 |
btest r1, n4 | 11011111 | (r1 \shr n4) \wedge 1 → Z, 0 \leq n4 \leq 15 |
store r1, r2 | 11100001 | r1 \rightarrow [r2] \at [r2+1] |
loadl r1, r2 | 11100010 | (r2 \wedge \texttt{0xff00}) \vee [r1] → r2 |
loadh r1, r2 | 11100011 | (r2 \wedge \texttt{0x00ff}) \vee ([r1] \shl 8) → r2 |
loadb r1, r2 | 11100100 | [r1] → r2, \textnormal{signed} |
storel r1, r2 | 11100101 | (r1 \wedge \texttt{0x00ff}) → [r2] |
storeh r1, r2 | 11100110 | (r1 \shr 8) → [r2] |
ldpgm r1, r2 | 11100111 | [r1] \at [r1+1] → r2, \textnormal{load from program memory} |
call r1, r2 | 11101000 | r1 \rightarrow pc, pc → r2 |
br n8 | 11110000 | pc + n8 → pc, -128 \leq n8 \leq 127 |
brz n8 | 11110001 | Z = 1 \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brnz n8 | 11110010 | Z = 0 \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brle n8 | 11110011 | \leq / (Z = 1) \vee (N \not = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brlt n8 | 11110100 | < / (Z = 0) \wedge (N \not = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brge n8 | 11110101 | \geq / (Z = 1) \vee (N = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brgt n8 | 11110110 | > / (Z = 0) \wedge (N = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brule n8 | 11110111 | \leq / (Z = 1) \vee (C = 1) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brult n8 | 11111000 | < / (Z = 0) \wedge (C = 1) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
bruge n8 | 11111001 | \geq / (Z = 1) \vee (C = 0) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
brugt n8 | 11111010 | > / (Z = 0) \wedge (C = 0) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127 |
sext r1, r2 | 11111011 | (r1 \shl 8) \sar 8 → r2 |
ldvec n4, r1 | 11111100 | \textnormal{interrupt vector}\ n4 → r1, 0 \leq n4 \leq 15 |
stvec r1, n4 | 11111101 | r1 → \textnormal{interrupt vector}\ n4, 0 \leq n4 \leq 15 |
jmp r1 | 111111100000 | r1 → pc |
jmpz r1 | 111111100001 | Z = 1 \Rightarrow r1 → pc |
jmpnz r1 | 111111100010 | Z = 0 \Rightarrow r1 → pc |
jmple r1 | 111111100011 | \leq / (Z = 1) \vee (N \not = V) \Rightarrow r1 → pc |
jmplt r1 | 111111100100 | < / (Z = 0) \wedge (N \not = V) \Rightarrow r1 → pc |
jmpge r1 | 111111100101 | \geq / (Z = 1) \vee (N = V) \Rightarrow r1 → pc |
jmpgt r1 | 111111100110 | > / (Z = 0) \wedge (N = V) \Rightarrow r1 → pc |
jmpule r1 | 111111100111 | \leq / (Z = 1) \vee (C = 1) \Rightarrow r1 → pc |
jmpult r1 | 111111101000 | < / (Z = 0) \wedge (C = 1) \Rightarrow r1 → pc |
jmpuge r1 | 111111101001 | \geq / (Z = 1) \vee (C = 0) \Rightarrow r1 → pc |
jmpugt r1 | 111111101010 | > / (Z = 0) \wedge (C = 0) \Rightarrow r1 → pc |
intr n4 | 111111101011 | \textnormal{interrupt vector}\ n4 → pc, pc → ira, 0 \leq n4 \leq 15 |
getira r1 | 111111101100 | ira → r1 |
setira r1 | 111111101101 | r1 → ira |
getfl r1 | 111111101110 | flags → r1 |
setfl r1 | 111111101111 | r1 → flags |
reti | 1111111111110000 | ira → pc |
nop | 1111111111110001 | \textnormal{do nothing} |
sei | 1111111111110010 | 1 → I |
cli | 1111111111110011 | 0 → I |
error | 1111111111111111 | \textnormal{invalid operation} |
Instruction Set of marca processor
NOTES
edit- Modulo does not follow the patterns for "div" and "udiv", because the was not enough room for two more 3-operand operations. The assembler accepts the mnemonic with 3 registers as operands and substitute it with the according "mov" and "mod" instructions.
- Interrupts are nice, but not necessary roght from the beginning. Thus the related instructions will not be implemented until the rest of the processor is reasonably working.
Pipelining
editWe are considering a 4-stage pipeline:
- instruction fetch
- instruction decode
- execution/memory access
- write back
This scheme is similar to the one used in the MIPS architecture, only execution and write back stage are drwan together. For our architecture does not support indexed adressing, it does not need the ALU's result and can work in parallel, having the advantage of reducing the possible hazards.