Program, data and stack memories occupy the same memory space. The total addressable memory size is 64 KB.
program can be located anywhere in memory. Jump, branch and call instructions use 16-bit addresses, i.e. they can be used to jump/branch anywhere within 64 KB. All jump/branch instructions use absolute addressing
The processor always uses 16-bit addresses so that data can be placed anywhere.
is limited only by the size of memory. Stack grows downward.
The processor has 5 interrupts – INTR, RST5.5 ,RST6.5 RST7.5 and Trap.
256 Input ports, 256 Output ports
or A register is an 8-bit register used for arithmetic, logic, I/O and load/store operations.
8-bit register containing 5 1-bit flags:
- Sign ,Zero, Parity , Carry set , or borrow during subtraction/comparison.
8-bit B and 8-bit C registers,8-bit D and 8-bit E registers,8-bit H and 8-bit L registers
is a 16 bit register. This register is always incremented/decremented by 2.
is a 16-bit register.
Data moving instructions,Arithmetic - add, subtract, increment and decrement,Logic - AND, OR, XOR and rotate,Control transfer - conditional, unconditional, call subroutine, return from subroutine and restarts,Input/Output instructions,Other - setting/clearing flag bits, enabling/disabling interrupts, stack operations, etc.
Register references the data in a register or in a register pair.
Register indirect - instruction specifies register pair containing address, where the data is located.
Direct Immediate - 8 or 16-bit data.
Architectural Features of the R8000Edit
The R8000 is a 64bit RISC microprocessor with strong emphasis on floating point performance, that is spcifically designed for supercomputing applications. It implements the MIPS IV instruction set architecture (ISA). The MIPS R8000 processor is designed to deliver extremely high floating point performance. The R8000 key features include:
- Multi-component chip set consisting of an integer unit (IU), floating-point unit (FPU), tag RAMS, and 2 MB of data streaming cache
- Four-way superscalar architecture, six operations per clock cycle
- True 64-bit microprocessor with 64-bit integer and floating-point operations, registers, and virtual addresses
- 3.3-volt technology
- 16 KB of instruction cache (I-cache) in IU, 16 KB of dual-ported data cache (D-cache) in IU, 1K entries of branch prediction cache
- Memory Management Unit (MMU) in IU contains a 384-entry, dual-ported, three-way set associative Translation Lookaside Buffer (TLB)
- ANSI/IEEE-754 standard floating-point coprocessor with imprecise interrupts
- 32 doubleword (64-bit) general-purpose registers in IU and 32 floating-point registers in FPU
- 128-bit data bus and a separate 40-bit address bus that can access up to 1TB of physical memory
- Upward compatibility with earlier 32-bit and 64-bit MIPS microprocessors
|Total Number of Instructions per cycle||4|
|Number of Integer Instructions per cycle||2|
|Number of Floating Point Instructions per cycle||2|
|Number of Multiply-Add Instructions per cycle.||2|
|Number of Load/Store Instructions per cycle||2|
|Out-of-order Instruction execution||No|
Instructions are fetched from an on-chip 16-Kbytes instruction cache (Instruction Cache). Four instructions (128 bits) are fetched per cycle.There are three categories of new instructions: fused multiply-add, register+register addressing mode, and conditional moves.The fused multiply-add instructions - three input operands and performs a multiplication followed by an addition with no intermediate rounding. The register+register addressing mode is supported for floating-point loads and stores to enhance the performance of array accesses with arbitrary strides.Integer and memory instructions get their operands from a 13 port register file (Integer Register File).
The R8000 processor includes the following functional units, Integer and memory instructions:
- 2X load/store
- 2X ALU
- 1X shifter
- 1X integer multiply/divide
- 2X FPU
The FPU implements the following operations:
- MADD (1 cycle latency Multiply-Add; ex: a = a + b * c)
- Square Root
- Reciprocal (i.e. 1/x)
- Reciprocal Square Root
The characteristics of the R8000 memory subsystem - number of ports, sizes and algorithms of the caches, tag RAM, and buffering schemes - complement the high-performance computational capabilities of the R8000 and ensure that memory bandwidth demands from the floating-point and integer units are met.
|Intruction fetch aligment precode bench prediction||decode scoreboard register file read||generate load/store adress||ALU data cache TLB lookup branch resolution exception detection||regfile write|
The Fetch stage - accesses the instruction cache and the branch prediction cache (to be explained later). The Decode stage - makes dispatch decisions based on register scoreboarding and resource reservations, and also reads the register file. The Address stage - computes the effective addresses of loads and stores. The Execute stage - evaluates the ALU operation, accesses the data cache and TLB, resolves branches and handles all exceptions. Finally, the Writeback stage - updates the register file.
EXTERNAL CACHE PIPELINEEdit
There are five stages in the external cache pipeline. Addresses are sent from the R8000 to the tag ram in the first stage. The tags are looked up and hit/miss information is encoded in the second stage. The third stage is used for chip crossing from the tag ram to the data rams. The SSRAM is accessed internally within the chip in the forth stage. Finally data is sent back to the R8000 and R8010 in the fifth stage.
* * * I wandted here to add a picture, but I couldn't register. I also couldn't add a external Picture.
img src="stud3.tuwien.ac.at/~e0248591/r8000.PNG" ***
It is designed by Advanced Micro Devices (AMD), who have since renamed it AMD64. The AMD64 architecture is a simple yet powerful 64-bit, backward-compatible extension of the industry-standard (legacy) x86 architecture.
- 20 -. 31 Stages
- it last 20 (31) tacts
- Pipeline hard to fill
- TC Nxt = trace cache next instruction pointer
- TC Fectch = trace cache fetch
- Rename = register renaming
- Que = micro-op queuing
- Sch = micro-op scheduling
- Disp = dispatch
- RF = register file
- Ex = execute
- Flgs = flags
- BrCk = branch check
IA-32 (x86) Instruktions-set
- 32-Bit instructions; 32-Bit Adresses
- Variable instruction lenght: 1-16 Bytes
- no Load-Store architecture
- operand may be on memory
- instructions can change the memory
- 2 Adress-instructions
- target is also the source ADD EAX, 
- branch instructions
- check statusbits in Flag-Register
- different levels
- Level 0: operating system
- Level 3: User-Programm
- Data transfer instructions (MOV)
- binary arithmetic (ADD, SUB)
- logical instructions (AND, OR)
- shift and rotate (ROR, SAR)
- bit and byte instructions (BTS, SETE)
- control transfer instructions (JMP, LOOP, CALL)
- string instructions (MOVS, SCAS)
- flag control instructions (STD, STI)
- segment register instructions (LDS)
- miscellaneous instructions (LEA, NOP, CPUID)
Complicated and complex size
- 1-Byte Opcodes
- originally 1-Byte Opcodes
- Prefixes possible
- Other words width or iteration
- Instructions decoding is difficult
- ModR/M, SIB, Displacement, Immediate:
- Operands addressing
- 24 Adress types
- Offset = Base + Index*Scale + Displacement
- Mixture General-Purpose and Special-Purpose Register
- EAX: Accumulator
- EBX: Base-Register
- ECX: Count-Register
- Special: Loops
- EDX: Data-Register
- Special: multiplication / division
- EBP: Base-Pointer
- points out at the Stacks start
- ESI, EDI: Source- und Destination-Register
- ESP: Stack-Pointer
- EFLAGS: Flags
Addressing: the choise of an instruction operands
– where are the operands stored
Immediate addressing: MOV EAX, 1234
– specifying the data direct instead of adress
Direct addressing: MOV EAX, 
– detail of (32-Bit) memory address
Register-Adressing: MOV EAX, EBX
– Register contain operand
Register-indirect adressing: MOV EAX, [EBX]
– Register contain the adress of operands
Indexed addressing: MOV EAX, table[EBX]
– Basicadress + Offset Register
Indexed Register- indirect addressing with displacement
– IA-32 speciality: MOV EAX, table[EBX*4 + 1]