Computer Architecture Lab/Summer2006/PitterDeinhart/TCMP2.0

The Concept

edit

The design is based on a 16bit load/store architecture. All instructions need two bytes space. A rather unusal (but already existing in some chips) idea is, that every opcode can be flagged as conditional.

Memory

edit

Instruction and data memory are separated. Instructions are read from ROM data can be read and written to the RAM.

Since all instructions consist of 2 bytes the ROM can adress 128 kilobytes. The instruction pointer counts words. Constants can be loaded from ROM using the LDI* opcodes (see below).

Since RAM can be accessed byte per byte it can store only 64 kilobytes of data.

Registers

edit

There are 16 general purpose registers. Their size is 16 bits and they are called ax, bx, cx, .. px.

Additionally there exists the 1 bit sized conditional flag (CF).

Instruction Set

edit

Instruction Set Description

edit

Conditionals

edit

A conditional bit is added to each instruction. If this bit is 1, then the instruction is considered conditional and is only executed if conditional flag is set. The conditional flag can be modified by CMP* or by calling clear conditional flag (CCF). Those can be conditional instructions, too.

Load & Store

edit

MS: you will need an indierct load and store (LD (ax), ST (AX)). Without those instructions you will be VERY limited. Then you will think again about a register size which is less than the address size - one of the big issues in the 8086..80286 (the segementation was a real pain) ;-)

The heart of our load/store architecture are these two commands. Actually one could argue that they are 32 commands, because they have the register number hardcoded into the opcode.

LD	    loads a word from RAM into a register
ST	    stores a word from a register into the RAM
LDIL,LDIH  loads an immediate byte from ROM into a register

Comparison

edit

All comparison instructions set the conditional flag as a result.

CMPEQ	  tests if two registers are equal
CMPNE	  tests if two registers are not equal
CMPGT	  tests if register 1 > register 2
CMPLT	  tests if register 1 < register 2
CMPEZ	  tests if register is equal to zero
CMPNZ	  tests if register is not equal to zero

Is there any way to set the conditional flag when register 1 <= register 2 ? Or do we need a CMPLE ?

Is there any difference at all between GMPGT R1, R2 or GMPLT R2, R1 ?

Branching

edit
JMP	  loads a registers value into the instruction pointer

ALU operations

edit
ADD	  adds registers 1 to register 2
ADDI	  adds immediate value to register
SUB	  subtracts registers 1 to register 2
SUBI	  subtracts immediate value from register
AND	  bitwise locigally ands register 1 to register 2
OR	  bitwise locigally ors register 1 to register 2
XOR	  bitwise locigally xors register 1 to register 2
SHL	  shifts the register specified bits left
SHR	  shifts the register specified bits right
NOT	  bitwise logically nots register

Others

edit
CCF	  clears the conditional flag
NOP	  null operation, justs waits a cycle

Instruction Set Encoding

edit
a .. ALU operation number
c .. conditional flag (CF)
i .. immediate value
s .. source register number
d .. destination register number
m .. memory pointer register number
j .. jump pointer register number
_ .. don't care
ALUOPS c1aaaaaa ssssdddd
LDIL   c000iiii iiiidddd
LDIH   c001iiii iiiidddd
LD     c0100000 mmmmdddd
ST     c0100001 mmmmdddd
JMP    c0100010 ____jjjj
CCF    c0100110 ________
NOP    c0100111 ________
optional and reserved for future use:
LOOP   c0101iii iiiiiiii
LDX    c011iiii mmmmdddd
ALU ops
edit
operations that do not modify CF: 0xxxxx
000000 ADD
000001 SUB
000010 AND
000011 OR
000100 XOR
000101 NOT
000110 SHL
000111 SHR
001000 ASR
operations that do modify CF: 1xxxxx
100000 CMPEQ
100001 CMPNE
100010 CMPGT
100011 CMPLT
100100 CMPEZ (d = dont care)
100101 CMPNZ (d = dont care)

The Assembler and Simulator

edit

The assembler/simulator package can be downloaded from http://www.nix.at/sw/tcmp/

Note that only versions >= 0.1 only support TCMP2.0 while versions 0.0.* only support TCMP1. The version that represents the state at the end of the computer architecture course is 0.2.4.

The pacakge consists of the assembler and 3 different simulator programs, which are all described below. Read the INSTALL file in the package for installation hints/requirements.

The package is written in Objective Caml, wich is a really greate computer language. It will generate byte code and object code executables (if supported on your architecture). So if you find an executable called e.g. asm.opt you may call it instead of just asm and it will do the same, but a lot faster.

The assembler: tcmp/asm

edit

The assembler can output text (to verify parser and some calculations), binary output (mainly used for simulation) and vhdl code (can be used as or in a ROM implementation).

When u call it with no arguments or with -h it will give you instruction how to use it:

asm: usage: ./asm (-b|-r|-t) <inputfile>
    -b binary output (better not to stdout..)
    -r vhdl rom code output
    -t asm text output (to verify parser)

Line Layout

edit

The assembler layout is similar to nasm, its lines consist of up to three parts:

    label:    instruction operands        ; comment

All three components are optional. Operands are separated by a ','.

The main difference is that there is an optional '?' that can be written directly before the instruction. That marks the instruction to be a conditional one.

Mnemonics

edit

The mnemonics are lower case variants of the instructions.

There are some macro like mnemonics that will expand to several instructions when used:

jump reg,label   ; this will jmp to label using reg for address generation

Example Program 1: Blinking around

edit

This little example program blinks with the boards LED.

        ;;  blink led
;
begin:
       ;; counter delta
       ldil cx,1
       ldih cx,0
       ;; high word
       ldil bx,70     ; change this to adjust frequency
       ldih bx,0
bigloop:
       ;; low word
       ldil ax,0x20
       ldih ax,0xa1
smallloop:
       sub cx,ax
       cmpnz ax
       ?jump (fx),smallloop
;
       sub cx,bx
       cmpnz bx
       ?jump (fx),bigloop
doblink:
       not ox,ox
       jump (fx),begin

Example Program 2: Blinking around with changing frequency

edit

This is a modification to the previous example, with variable frequency.

       ;;  blink led
;
       ;; counter delta
       ldil cx,1
       ldih cx,0
begin:
       ;; meta freq
       ldil ex,70
       ldih ex,0
metaloop:
       ;; high word
       ldil bx,0     ; change this to adjust frequency
       ldih bx,0
bigloop:
       ;; low word
       ldil ax,0x20
       ldih ax,0xa1
smallloop:
       sub cx,ax
       cmpnz ax
       ?jump (fx),smallloop
;
       add cx,bx
       cmpne bx,ex
       ?jump (fx),bigloop
doblink:
       not ox,ox
       sub cx,ex
       cmpnz ex
       ?jump (fx),metaloop
       jump (fx),begin

Example Program 3: Instruction tester

edit

Not really useful, but it covers all instructions.

;  comment
       ldil ax,10              ;  low byte of ax := 10
       nop
       ldih ax,0               ;  high byte of ax := 10
       st ax,(bx)              ;  Memory[ax] := bx
       ld ax,(bx)              ;  ax := Memory[bx]
       ccof
       jump (fx),l1            ;  fx := AddressOf(l1); ip := fx
       jmp (ax)                ;  ip := ax
       ccof                    ;  c := 0
       nop                     ;  no operation
       add ax,bx               ;  bx := ax + bx
       sub ax,bx               ;  bx := ax - bx
       and ax,bx               ;  bx := ax and bx
       not ax,bx               ;  bc : = not ax
       or ax,bx                ;  bx := ax or bx
       xor ax,bx               ;  bx := ax xor bx
       shl ax                  ;  ax := ax shl 1
       shr bx                  ;  bx := ax shr 1
       asr cx                  ;  cx := cx asr 1
       cmpeq ax,bx             ;  c := ax == bx
       cmpne ax,bx             ;  c := ax != bx
       cmpgt ax,bx             ;  c := ax > bx
       cmplt ax,bx             ;  c := bx < ax
       cmpez ax                ;  c := ax == 0
       cmpnz ax                ;  c := ax != 0
l1:
       nop
       ?not bx
       cmpez ax
l2:
       ccof
l3:
l3b:   not ax
       sub cx,ax
       jump (fx),l2
l4:
       jump (fx),l4

The simulator: tcmp/sim

edit

This simulator takes an assembler program and simulates it instruction by instruction. You can optionally specify how much cycles it will simluate. It has its own little help screen, too:

./sim: usage: ./sim <inputfile> [<max-steps>]

If you simulate the first instruction of blink.asm you will get this output:

./sim blink.bin 1 | awk '{ printf " ";print}'
registers:     ax=00 cx=00 ex=00 gx=00 ix=00 kx=00 mx=00 ox=00 ip=0
               bx=00 dx=00 fx=00 hx=00 jx=00 lx=00 nx=00 px=00 cof=false
RAM:
0000:  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
001a:  0000 0000 0000
ROM:
0000:  0012 2700 1002 0011 2700 1001 0010 2700 1000 2700 4120 2700 6500
000d:  2700 80a5 a700 9005 a700 a205 2700 4121 2700 6510 2700 8065 a700
001a:  9005 a700 a205 2700 45ee 0005 2700 1005 2700 2205


executing instruction:  -> ldil        cx,1     (0000000000010010=0x0012)
registers:     ax=00 cx=01 ex=00 gx=00 ix=00 kx=00 mx=00 ox=00 ip=1
               bx=00 dx=00 fx=00 hx=00 jx=00 lx=00 nx=00 px=00 cof=false
RAM:
0000:  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
001a:  0000 0000 0000
ROM:
0000:  0012 2700 1002 0011 2700 1001 0010 2700 1000 2700 4120 2700 6500
000d:  2700 80a5 a700 9005 a700 a205 2700 4121 2700 6510 2700 8065 a700
001a:  9005 a700 a205 2700 45ee 0005 2700 1005 2700 2205

Please note that you won't see any hazards here, as this simulator knows nothing about pipelining at all. This can be useful to verify the correctness of the pipeline engine (or the builtin hazard prevention freature of the assembler).

The pipelined simulator: tcmp/psim

edit

This is another simulator, which is completly different from the previous one. While it should give the same results it simulates the complete TCMP pipeline. So it will corrupt registers or RAM if there are hazards. In short: It will (try to) act like the processor in hardware.

While its internals are completly different, the usage is equal to the non pipelined simulator:

./psim: usage: ./psim <inputfile> [<max-steps>]

If we simlate 5 clock cycles of blink.asm we get:

RST
 IF: ip=0001
 ID: cof=false rom[0000]=0012 that is ldil     cx,1
 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=00 s23ldiop=0
     reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000

ox=0000

               bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000

px=0000

 WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=0000 s34ldiout=0000
CLK
 IF: ip=0002
 ID: cof=false rom[0001]=2700 that is nop
 EX: s23op=LDI s23rda=2 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=01 s23ldiop=0
     reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000

ox=0000

               bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000

px=0000

 WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=0000 s34ldiout=0000
CLK
 IF: ip=0003
 ID: cof=false rom[0002]=1002 that is ldih     cx,0
 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=27 s23ldiv=70 s23ldiop=0
     reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000

ox=0000

               bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000

px=0000

 WB: s34op=LDI s34rda=2 ram[0000]=0000 s34aluout=0000 s34ldiout=0001
CLK
 IF: ip=0004
 ID: cof=false rom[0003]=0011 that is ldil     bx,1
 EX: s23op=LDI s23rda=2 rd1=0000 rd2=0001 s23aluops=10 s23ldiv=00 s23ldiop=1
     reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000

ox=0000

               bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000

px=0000

 WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=3039 s34ldiout=0070
CLK
 IF: ip=0005
 ID: cof=false rom[0004]=2700 that is nop
 EX: s23op=LDI s23rda=1 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=01 s23ldiop=0
     reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000

ox=0000

               bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000

px=0000

 WB: s34op=LDI s34rda=2 ram[0000]=0000 s34aluout=3039 s34ldiout=0001
CLK
 IF: ip=0006
 ID: cof=false rom[0005]=1001 that is ldih     bx,0
 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=27 s23ldiv=70 s23ldiop=0
     reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000

ox=0000

               bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000

px=0000

 WB: s34op=LDI s34rda=1 ram[0000]=0000 s34aluout=0000 s34ldiout=0001

RST and CLK are the reset and the clock signal.

The graphical pipelined simulator: tcmp/gpsim

edit

This is a graphical frontend to the pipelined simulator (psim). It accepts no option, instead it will show you a file dialog to choose some binary code to load.

The interface is quite simple and self explaining. Just take a look at this screenshot:

 

Pipeline Architecture

edit

The TCMP2.0 pipeline consists of 4 stages:

  • Instruction fetch
  • Instruction decode
  • Execute
  • Write back

If you have not by now, please take a look at the picture in the gpsim section to get an overview. The line colors are showing from which stage the signals are coming (Cyan, Bue, Red, Green).

At this time there is no processor builtin hazard prevention, but a smart nop generateion at assembler level. Due to our not too unclever design, there is no need to delay the processor for more than one nop at once. Of course one could implement e.g. bypassing but to our luck that was not in the mandatory scope of computer architecture.

But maybe by the time you read this, it is already implemented in the simulator or even at vhdl level. So be sure to check the download packages!

VHDL Implementation

edit

The VHDL implementation was generated by first creating the black boxes like ALU and register file. Then we basically translated the code from the pipelined simulator (can be found in pipeline.ml) into VHDL. Once that seemed to be complete we simulated the whole thing in Modelsim.

After a few fixes, mainly dangling signals, we downloaded the processor to the FPGA. To our suprise it nearly instantly worked. Maybe it was a good idea to simulate a lot with tcmp/*sim and Modelsim. On the other side, maybe we just had lots of luck :)

The full vhdl files can be downloaded here: [http://www.nix.at/tcmp2/vhdl.zip vhdl.zip].

Additionally a couple of testbenches were created to simulate some building blocks of the processor in ModelSim. The following screenshot describes a simulation of the processor running the blink2.asm program.