Computer Architecture Lab/Winter2006/KammPuffBili/InstructionSet

General

edit
  • 16 16-bit registers
  • flags: Z, C, V, N
    • Z: all bits of the last result are zero
    • C: 17^th bit of the last result
    • N: 16^th bit of the last result
    • V: overflow, after sub/cmp it is  , the latter two according to the result
    • I: allow interrupts
    • P: parity of the last result
  • any register as return address
  • Some parts come from the Alpha architecture. "ldpgm" had to be there, because those were painfully missing from the SPEAR Architecture. The handling of branches is inspired by the Intel x86.
  • separate registers for interrupt vectors - read and written through "ldvec" / "stvec"

The processor uses a Harvard architecture; although it has not prevailed in mainstream-architectures, it is still used in embedded processors such as the Atmel AVR. The separation of code- and data-memory is not flexible enough for mainstream systems, but with small embedded processors the program code tends to be fixed anyway. A Harvard architecture enables the processor to make use of more memory (which is an issue when the adress space is limited to 64k), and the program code can be read from a ROM directly. A transient failure thus cannot destroy the program by overwriting its code section.

Instruction Set

edit
add r1, r2, r3 0000 r1 + r2 → r3
sub r1, r2, r3 0001 r1 - r2 → r3
addc r1, r2, r3 0010 r1 + r2 + C → r3
subb r1, r2, r3 0011 r1 - r2 - C → r3
and r1, r2, r3 0100 r1 \wedge r2 → r3
or r1, r2, r3 0101 r1 \vee r2 → r3
xor r1, r2, r3 0110 r1 \oplus r2 → r3
mul r1, r2, r3 0111 r1 * r2 → r3
div r1, r2, r3 1000 r1 \div r2 → r3
udiv r1, r2, r3 1001 r1 \div r2 → r3, \textnormal{unsigned}
ldil n8, r1 1010 (r1 \wedge \texttt{0xff00}) \vee n8 → r1, -128 \leq n8 \leq 255
ldih n8, r1 1011 (r1 \wedge \texttt{0x00ff}) \vee (n8 \shl 8) → r1, -128 \leq n8 \leq 255
ldib n8, r1 1100 n8 → r1, -128 \leq n8 \leq 127
mov r1, r2 11010000 r1 → r2
mod r1, r2 11010001 r1\ \textnormal{mod}\ r2 → r1
umod r1, r2 11010010 r1\ \textnormal{mod}\ r2 → r1, \textnormal{unsigned}
not r1, r2 11010011 \lnot r1 → r2
neg r1, r2 11010100 -r1 → r2
cmp r1, r2 11010101 r1 - r2, \textnormal{sets flags}
addi r1, n4 11010110 r1 + n4 → r1, -8 \leq n4 \leq 7
cmpi r1, n4 11010111 r1 - n4, \textnormal{sets flags}, -8 \leq n4 \leq 7
shl r1, r2 11011000 r1 \shl r2 → r1
shr r1, r2 11011001 r1 \shr r2 → r1
sar r1, r2 11011010 r1 \sar r2 → r1
rolc r1, r2 11011011 (r1 \shl r2) \vee (C \shl (r2-1)) \vee (r1 \shr (16-r2-1))
rorc r1, r2 11011100 (r1 \shr r2) \vee (C \shl (16-r2)) \vee (r1 \shl (16-r2-1))
bset r1, n4 11011101 r1 \vee (1 \shl n4) → r1, 0 \leq n4 \leq 15
bclr r1, n4 11011110 r1 \wedge \lnot (1 \shl n4) → r1, 0 \leq n4 \leq 15
btest r1, n4 11011111 (r1 \shr n4) \wedge 1 → Z, 0 \leq n4 \leq 15
store r1, r2 11100001 r1 \rightarrow [r2] \at [r2+1]
loadl r1, r2 11100010 (r2 \wedge \texttt{0xff00}) \vee [r1] → r2
loadh r1, r2 11100011 (r2 \wedge \texttt{0x00ff}) \vee ([r1] \shl 8) → r2
loadb r1, r2 11100100 [r1] → r2, \textnormal{signed}
storel r1, r2 11100101 (r1 \wedge \texttt{0x00ff}) → [r2]
storeh r1, r2 11100110 (r1 \shr 8) → [r2]
ldpgm r1, r2 11100111 [r1] \at [r1+1] → r2, \textnormal{load from program memory}
call r1, r2 11101000 r1 \rightarrow pc, pc → r2
br n8 11110000 pc + n8 → pc, -128 \leq n8 \leq 127
brz n8 11110001 Z = 1 \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brnz n8 11110010 Z = 0 \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brle n8 11110011 \leq / (Z = 1) \vee (N \not = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brlt n8 11110100 < / (Z = 0) \wedge (N \not = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brge n8 11110101 \geq / (Z = 1) \vee (N = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brgt n8 11110110 > / (Z = 0) \wedge (N = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brule n8 11110111 \leq / (Z = 1) \vee (C = 1) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brult n8 11111000 < / (Z = 0) \wedge (C = 1) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
bruge n8 11111001 \geq / (Z = 1) \vee (C = 0) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brugt n8 11111010 > / (Z = 0) \wedge (C = 0) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
sext r1, r2 11111011 (r1 \shl 8) \sar 8 → r2
ldvec n4, r1 11111100 \textnormal{interrupt vector}\ n4 → r1, 0 \leq n4 \leq 15
stvec r1, n4 11111101 r1 → \textnormal{interrupt vector}\ n4, 0 \leq n4 \leq 15
jmp r1 111111100000 r1 → pc
jmpz r1 111111100001 Z = 1 \Rightarrow r1 → pc
jmpnz r1 111111100010 Z = 0 \Rightarrow r1 → pc
jmple r1 111111100011 \leq / (Z = 1) \vee (N \not = V) \Rightarrow r1 → pc
jmplt r1 111111100100 < / (Z = 0) \wedge (N \not = V) \Rightarrow r1 → pc
jmpge r1 111111100101 \geq / (Z = 1) \vee (N = V) \Rightarrow r1 → pc
jmpgt r1 111111100110 > / (Z = 0) \wedge (N = V) \Rightarrow r1 → pc
jmpule r1 111111100111 \leq / (Z = 1) \vee (C = 1) \Rightarrow r1 → pc
jmpult r1 111111101000 < / (Z = 0) \wedge (C = 1) \Rightarrow r1 → pc
jmpuge r1 111111101001 \geq / (Z = 1) \vee (C = 0) \Rightarrow r1 → pc
jmpugt r1 111111101010 > / (Z = 0) \wedge (C = 0) \Rightarrow r1 → pc
intr n4 111111101011 \textnormal{interrupt vector}\ n4 → pc, pc → ira, 0 \leq n4 \leq 15
getira r1 111111101100 ira → r1
setira r1 111111101101 r1 → ira
getfl r1 111111101110 flags → r1
setfl r1 111111101111 r1 → flags
reti 1111111111110000 ira → pc
nop 1111111111110001 \textnormal{do nothing}
sei 1111111111110010 1 → I
cli 1111111111110011 0 → I
error 1111111111111111 \textnormal{invalid operation}

Instruction Set of marca processor

NOTES

edit
  • Modulo does not follow the patterns for "div" and "udiv", because the was not enough room for two more 3-operand operations. The assembler accepts the mnemonic with 3 registers as operands and substitute it with the according "mov" and "mod" instructions.
  • Interrupts are nice, but not necessary roght from the beginning. Thus the related instructions will not be implemented until the rest of the processor is reasonably working.

Pipelining

edit

We are considering a 4-stage pipeline:

  • instruction fetch
  • instruction decode
  • execution/memory access
  • write back

This scheme is similar to the one used in the MIPS architecture, only execution and write back stage are drwan together. For our architecture does not support indexed adressing, it does not need the ALU's result and can work in parallel, having the advantage of reducing the possible hazards.