Computer Architecture Lab/Winter2006/JeitMossFrühRamb/ISA
Introduction
editMatPRO is a pipelined CoProzessor that handels 4x4 Matrix-Operations like addition or multiplication. It's equipped with a SimpCom-Interface to ensure an easy transport connection to existing SoC-Solutions.
key data
edit- 16 bit Coprocessor for handling Matrix-Arithmetics
- Dimension of Matrix is 4x4
- basic datatype is 16 bit signed Integer
- 3 Matrix-Registers
- 16 general purpose 16 bit Registers
- SIMD (Single Instruction Multiple Data) - Architecture
Instruction formats
editTo keep things simple, our instructions are all codeable in 16 bit. Besides this, all OP-Codes are 4 bit coded.
We distinguish between 5 different Instruction formats:
- 3 Opperand - Instructions
Bits | 15-12 | 11-8 | 7-4 | 3-0 |
---|---|---|---|---|
Content | OPCODE | DESTREG | SRCREG1 | SRCREG2 |
- 2 Opperand - Instructions
Bits | 15-12 | 11-8 | 7-4 | 3-0 |
---|---|---|---|---|
Content | OPCODE | DESTREG | SRCREG | 0000 |
- 1 Opperand - Instructions
Bits | 15-12 | 11-0 |
---|---|---|
Content | OPCODE | Address |
- Load/Store Word - Instructions
Bits | 15-12 | 11-8 | 7-0 |
---|---|---|---|
Content | OPCODE | DSTREG | Address |
- Load/Store Matrix - Instructions
Bits | 15-12 | 11-10 | 9-0 |
---|---|---|---|
Content | OPCODE | DSTREG | Address |
- Conditional - Instructions
Bits | 15-12 | 11-8 | 7-0 |
---|---|---|---|
Content | OPCODE | DSTREG | Address |
Instructions
editAt the moment, we plan to implement 14 different instructions:
width="500"}Instruction | OPCode | Description |
---|---|---|
nop | 0000 | does nothing |
jmp | 0001 | set the PC to desired value |
brz | 0010 | branche if zero: if the sourceregister is zero --> go on, else, jump to address |
sub | 0011 | subtracts one 16 bit integer from the other and stores the result in the destinationregister |
loadm | 0100 | loads a Matrix from the Memory at a given Addressstartpoint and stores it in the destinationregister |
loadw | 0101 | loads a 16 bit Integer from the Memory at a given Address and stores it in the destinationregister |
storem | 0110 | stores a Matrix from the Sourceregister to the Memory at the given Address |
storew | 0111 | stores a 16 bit Integer from the Sourceregister to the Memory at the given Address |
mulm | 1000 | multiplies 2 Matrix and stores the result in the destinationregister |
addm | 1001 | adds 2 Matrix and stores the result in the destinationregister |
subm | 1010 | subtracs 1 Matrix from the other and stores the result in the destinationregister |
1011 | still free | |
1100 | still free | |
1101 | still free | |
mulw | 1110 | multiplies a Matrix with a Scalar and stores the result in the destinationregister |
1111 | still free |
Special purpose of OPCode
editSince we are using 4 bit to code the desired register, only 16 Registers are possible to address. As mentioned at the beginning, we are using 32 registers (16 Matrix-Registers and 16 "normal" registers). So how do we know which register is meant to be read?
Here comes the OPCode into play.
If you analyze the OPCode, you will notice, that the first 2 bits of it decide what to do:
- 00xx: those are operations that don't to anything with a Matrix
- 01xx: those are load or store instructions
- 10xx: the sourceregisters are the Matrix-Registers
- 11xx: The first sourceregister is a Matrix and the second sourceregister is a 16 bit value
Assembler
editAssembler | Operation |
---|---|
nop | nothing |
jmp addr | PC <- addr12 |
brz i0,imm8 | true: PC <- PC+1; false: PC <- imm8 |
sub i2,i0,01 | i3 <- i0-i1 |
loadm m0,addr10 | m0 <- (addr10) |
loadw i0,addr8 | i0 <- (addr8) |
storem m0,addr10 | (addr10) <- m0 |
storew i0,addr8 | (addr8) <- i0 |
mulm m2,m0,m1 | m2 <- m0*m1 |
addm m2,m0,m1 | m2 <- m0+m1 |
subm m2,m0,m1 | m2 <- m0-m1 |
mulw m1,m0,i0 | m1 <- m0*i0 |
Legend:
m0,m1,...,mF ... Matrix-Registers
i0,i1,...,iF ... 16 bit Integer - Registers
addr8 ... 8 bit Address
addr10 ... 10 bit Address
addr12 ... 12 bit Address
imm8 ... 8 bit signed immediate
Assembler:
Block Diagramm
editThe matrix processor (MatPro) is designed as a coprocessor an communicates with the main processor (JOP in this case) with the SimpCon Interface. The memory is separated into a data cache and a instruction cache to keep the memory access simple. The data cache is double buffered to allow a more efficient data transfer beween the processors. The main processor writes all data and instuctions in the caches and sets RUN, then MatPro starts it's program and sets READY when the program execution is done.