Computer Architecture Lab/Winter2006/KammPuffBili/InstructionSet

General

16 16-bit registers
flags: Z, C, V, N

- Z: all bits of the last result are zero
- C: 17^th bit of the last result
- N: 16^th bit of the last result
- V: overflow, after sub/cmp it is $r1:15\oplus r2:15\oplus N\oplus C$ , the latter two according to the result
- I: allow interrupts
- P: parity of the last result

any register as return address
Some parts come from the Alpha architecture. "ldpgm" had to be there, because those were painfully missing from the SPEAR Architecture. The handling of branches is inspired by the Intel x86.
separate registers for interrupt vectors - read and written through "ldvec" / "stvec"

The processor uses a Harvard architecture; although it has not prevailed in mainstream-architectures, it is still used in embedded processors such as the Atmel AVR. The separation of code- and data-memory is not flexible enough for mainstream systems, but with small embedded processors the program code tends to be fixed anyway. A Harvard architecture enables the processor to make use of more memory (which is an issue when the adress space is limited to 64k), and the program code can be read from a ROM directly. A transient failure thus cannot destroy the program by overwriting its code section.

Instruction Set

add r1, r2, r3	0000	r1 + r2 → r3
sub r1, r2, r3	0001	r1 - r2 → r3
addc r1, r2, r3	0010	r1 + r2 + C → r3
subb r1, r2, r3	0011	r1 - r2 - C → r3
and r1, r2, r3	0100	r1 \wedge r2 → r3
or r1, r2, r3	0101	r1 \vee r2 → r3
xor r1, r2, r3	0110	r1 \oplus r2 → r3
mul r1, r2, r3	0111	r1 * r2 → r3
div r1, r2, r3	1000	r1 \div r2 → r3
udiv r1, r2, r3	1001	r1 \div r2 → r3, \textnormal{unsigned}
ldil n8, r1	1010	(r1 \wedge \texttt{0xff00}) \vee n8 → r1, -128 \leq n8 \leq 255
ldih n8, r1	1011	(r1 \wedge \texttt{0x00ff}) \vee (n8 \shl 8) → r1, -128 \leq n8 \leq 255
ldib n8, r1	1100	n8 → r1, -128 \leq n8 \leq 127
mov r1, r2	11010000	r1 → r2
mod r1, r2	11010001	r1\ \textnormal{mod}\ r2 → r1
umod r1, r2	11010010	r1\ \textnormal{mod}\ r2 → r1, \textnormal{unsigned}
not r1, r2	11010011	\lnot r1 → r2
neg r1, r2	11010100	-r1 → r2
cmp r1, r2	11010101	r1 - r2, \textnormal{sets flags}
addi r1, n4	11010110	r1 + n4 → r1, -8 \leq n4 \leq 7
cmpi r1, n4	11010111	r1 - n4, \textnormal{sets flags}, -8 \leq n4 \leq 7
shl r1, r2	11011000	r1 \shl r2 → r1
shr r1, r2	11011001	r1 \shr r2 → r1
sar r1, r2	11011010	r1 \sar r2 → r1
rolc r1, r2	11011011	(r1 \shl r2) \vee (C \shl (r2-1)) \vee (r1 \shr (16-r2-1))
rorc r1, r2	11011100	(r1 \shr r2) \vee (C \shl (16-r2)) \vee (r1 \shl (16-r2-1))
bset r1, n4	11011101	r1 \vee (1 \shl n4) → r1, 0 \leq n4 \leq 15
bclr r1, n4	11011110	r1 \wedge \lnot (1 \shl n4) → r1, 0 \leq n4 \leq 15
btest r1, n4	11011111	(r1 \shr n4) \wedge 1 → Z, 0 \leq n4 \leq 15
store r1, r2	11100001	r1 \rightarrow [r2] \at [r2+1]
loadl r1, r2	11100010	(r2 \wedge \texttt{0xff00}) \vee [r1] → r2
loadh r1, r2	11100011	(r2 \wedge \texttt{0x00ff}) \vee ([r1] \shl 8) → r2
loadb r1, r2	11100100	[r1] → r2, \textnormal{signed}
storel r1, r2	11100101	(r1 \wedge \texttt{0x00ff}) → [r2]
storeh r1, r2	11100110	(r1 \shr 8) → [r2]
ldpgm r1, r2	11100111	[r1] \at [r1+1] → r2, \textnormal{load from program memory}
call r1, r2	11101000	r1 \rightarrow pc, pc → r2
br n8	11110000	pc + n8 → pc, -128 \leq n8 \leq 127
brz n8	11110001	Z = 1 \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brnz n8	11110010	Z = 0 \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brle n8	11110011	\leq / (Z = 1) \vee (N \not = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brlt n8	11110100	< / (Z = 0) \wedge (N \not = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brge n8	11110101	\geq / (Z = 1) \vee (N = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brgt n8	11110110	> / (Z = 0) \wedge (N = V) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brule n8	11110111	\leq / (Z = 1) \vee (C = 1) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brult n8	11111000	< / (Z = 0) \wedge (C = 1) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
bruge n8	11111001	\geq / (Z = 1) \vee (C = 0) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
brugt n8	11111010	> / (Z = 0) \wedge (C = 0) \Rightarrow pc + n8 → pc, -128 \leq n8 \leq 127
sext r1, r2	11111011	(r1 \shl 8) \sar 8 → r2
ldvec n4, r1	11111100	\textnormal{interrupt vector}\ n4 → r1, 0 \leq n4 \leq 15
stvec r1, n4	11111101	r1 → \textnormal{interrupt vector}\ n4, 0 \leq n4 \leq 15
jmp r1	111111100000	r1 → pc
jmpz r1	111111100001	Z = 1 \Rightarrow r1 → pc
jmpnz r1	111111100010	Z = 0 \Rightarrow r1 → pc
jmple r1	111111100011	\leq / (Z = 1) \vee (N \not = V) \Rightarrow r1 → pc
jmplt r1	111111100100	< / (Z = 0) \wedge (N \not = V) \Rightarrow r1 → pc
jmpge r1	111111100101	\geq / (Z = 1) \vee (N = V) \Rightarrow r1 → pc
jmpgt r1	111111100110	> / (Z = 0) \wedge (N = V) \Rightarrow r1 → pc
jmpule r1	111111100111	\leq / (Z = 1) \vee (C = 1) \Rightarrow r1 → pc
jmpult r1	111111101000	< / (Z = 0) \wedge (C = 1) \Rightarrow r1 → pc
jmpuge r1	111111101001	\geq / (Z = 1) \vee (C = 0) \Rightarrow r1 → pc
jmpugt r1	111111101010	> / (Z = 0) \wedge (C = 0) \Rightarrow r1 → pc
intr n4	111111101011	\textnormal{interrupt vector}\ n4 → pc, pc → ira, 0 \leq n4 \leq 15
getira r1	111111101100	ira → r1
setira r1	111111101101	r1 → ira
getfl r1	111111101110	flags → r1
setfl r1	111111101111	r1 → flags
reti	1111111111110000	ira → pc
nop	1111111111110001	\textnormal{do nothing}
sei	1111111111110010	1 → I
cli	1111111111110011	0 → I
error	1111111111111111	\textnormal{invalid operation}

Instruction Set of marca processor

NOTES

Modulo does not follow the patterns for "div" and "udiv", because the was not enough room for two more 3-operand operations. The assembler accepts the mnemonic with 3 registers as operands and substitute it with the according "mov" and "mod" instructions.
Interrupts are nice, but not necessary roght from the beginning. Thus the related instructions will not be implemented until the rest of the processor is reasonably working.

Pipelining

We are considering a 4-stage pipeline:

instruction fetch
instruction decode
execution/memory access
write back

This scheme is similar to the one used in the MIPS architecture, only execution and write back stage are drwan together. For our architecture does not support indexed adressing, it does not need the ALU's result and can work in parallel, having the advantage of reducing the possible hazards.