Computer Architecture Lab/WS2007/SHWH/Processor Comparison

If you prefer to print or to download this document, we also offer the same content as PDF version.

MOS Technology 6502Edit


The MOS Technology 6502 is an 8-bit microprocessor that was designed by Chuck Peddle for MOS Technology in 1975. It has approximately 5000 transistors.

The internal logic runs at the same speed as the external clock rate, but despite the slow clock speeds between 20 KHz and 2 MHz, the 6502 was actually competitive with other CPU's using significantly faster clocks. This is partly due to simplistic state machine implemented by combinatorial logic to a greater extent than in many other designs; the two phase clock can thereby control the whole machine-cycle directly.

When the 6502 was introduced, it was the least expensive full-featured CPU on the market by a considerable margin, costing less than one-sixth the price of competing designs from larger companies such as Motorola and Intel.

His main concurrent was Zilog Z80.

6502 was the first microprocessor CPU with a 1-step instruction pipeline. This means that during execution of one command already the next instruction could be fetched.

One of the first uses for the design was the Apple I computer. The 6502 was next used in the Apple II, and the Commodore PET. It was later used in the Atari home computers, the BBC Micro family, and a huge number of other designs. Bender, a fictional android "industrial robot" and a main character in the animated TV series Futurama, was revealed to have a 6502 as his "brain".


6502 has very few registers. It has an 8-bit accumulator, two 8-bit index registers, one 8-bit stack pointer, an 8-bit status register, and a 16-bit program counter.

The accumulator is the main register for arithmetic and logic operations. Unlike the index registers X and Y, it has a direct connection to the Arithmetic and Logic Unit (ALU).

The stack memory ranges from 0x0100 to 0x01FF. The stack register (S) is a 8-bit offset to the stack page. In other words, whenever anything is being pushed on the stack, it will be stored to the address 0x0100+S.

The bits in this status register are called flags. The bits in the register are Negative, Overflow, Unused, Break, Decimal mode, Interrupt disable, Zerro, and Carry flags.

Instruction SetEdit

The 6502 microprocessor has a variable instruction encoding and the byte-order is Little Endian. Every command needs between 2 and 7 clock cycles. The instruction set includes 56 instructions.

Addressing ModesEdit

The chip used the index and stack registers effectively with several addressing modes, including a fast "direct page" or "zero page" mode, that accessed memory locations from address 0 to 255 with a single 8-bit address. The 6502 has altogether 13 addressing modes.

The 6502 has a 64 KByte address space.

Arithmetic InstructionsEdit

Available commands are addition/subtraction, binary connections, and rotate/shift operations.

Compare InstructionsEdit

The compare instructions set or clear three of the status flags Carry, Zero, and Negative.

The three types of compare instructions are CMP (Compare Memory and Accumulator), CPX (Compare Memory and Index X), and CPY (Compare Memory and Index Y).

Register InstructionsEdit

These commands include load and store, transfer, flag operations and push/pull instructions.

Jump InstructionsEdit

The 6502 has some conditional and some unconditional jumps.

Interrupt InstructionsEdit

There are exact 4 instructions available: sei, cli, brk, nop.


The 6502 has a 16-bit address bus and a 8-bit data bus. As the memory was faster than the CPU, it made sense to optimize the CPU for memory access.

Altogether the 6502 microprocessor has 40 pins.

DLX ProcessorEdit


The DLX processor is a RISC processor that was designed by John Hennessy and David Patterson with the main objective to produce a fully pipelined DLX processor for pedagogical purposes. It was first mentioned 1995 in "Computer Architecture: A Quantitative Approach." The pipeline of the DLX processor has 5 stages, namely fetch, decode, execute, memory, and writeback.

This processor is a load/store machine and emphasizes a simple instruction set, design for pipeling efficiency, an easily decoded instruction set, and efficiency as a compiler target.


The DLX processor has 32 general-purpose registers (GPRs), each of which is 32 bits long. Register r0 is a special register that always has the value 0.

Furthermore, the processor has 32 floating-point registers (FPRs), which can be used as 32 single precision 32-bit registers or as even-odd pairs holding double-precision values.

Last but not least, the DLX processor has a 32 bit program counter (PC) and 31 special purpose registers.

Instruction SetEdit

The DLX processor has a hybrid instruction encoding. Each instruction is encoded in a 32-bit word. It uses a Big Endian addressing scheme. There are 3 instruction formats, namely I-type, R-type, and J-type. The I-type format is generally used for arithmetic and logic instructions that have an immediate operand, and for branch instructions. The R-type format is used for arithmetic and logic instructions that operate entirely on data in registers. The J-type instructions are used for unconditional jump instructions.

64 basic instructions are supported by the DLX processor, though it can also support extended instructions, as long as those instructions work purely on registers.

Addressing ModesEdit

The DLX processor has only 3 addressing modes, namely immediate, displacement, and register.

The processor has a 4 GByte address space.

Data Transfer InstructionsEdit

The load and store instructions are the only means of transferring data between the CPU and memory.

Arithmetic and Logic InstructionsEdit

All ALU instructions are register-register instructions and contain addition, subtraction, multiplication, division, comparison, and so on.

Control Transfer InstructionsEdit

Control Transfer Instructions are mainly branches and jumps. There are also instructions available which deal with exceptions and interrupts.


The DLX processor has a 32 bit address and data bus. Together with clk, reset, and so on, the processor has 73 pins.



The 4stack processor is still a research project for high performance and low cost computing and designed by Bernd Paysan. The 4stack processor uses stack based instructions for a four way VLIW processor.

The 4stack processor has 4 Arithmetic Logic Units and 4 Stacks. Each stack has its own ALU. In addition to the 4 ALUs, two memory units allow parallel load and store.

The stacks store a 32 bit value and 64 bit data is represented by two stacks. Stack instructions either use 32 or 64 bit signed or unsigned integers or bit patterns, or 32 bit single or 64 bit double floats. Memory instructions load and store bytes, half words, words, and double words.

Less than 500k transistors are required for the core, leaving more space for caches. Furthermore, the stack paradigm greatly increases instruction density. The 4stack processor encodes up to 8 operations in 64 bits.


Each stack has 10 registers, including the stack pointer and the status register. There are 4 additional global registers for special purposes. For memory access there are further 32 registers.

Instruction SetEdit

The 4stack processor is a 32 bit machine. The command length is 64 bits, though each command can consist of several operation fields for the independent execution units. All these operations are performed simultaneous.

The load instruction takes 2 cycles, though the store instruction takes only 1 cycle. In case of a cache miss a wait instruction has to be inserted.

The instruction encoding uses Big Endian and has 5 main instruction formats.

Normal Instructions The normal instruction consists of four stack operations and two data move operations.
Conditional Setup Instructions The conditional setup instruction consists of four stack operations and four corresponding conditional setup operations.
Branch Instructions The branch instruction consists of four stack operations and one branch instruction.
Call Instructions The call instruction consists of three stack operations and one instruction pointer-relative or absolute call or jump instruction.
Far Call Instruction The far call instruction consists of one absolute call instruction.

Stack OperationsEdit

Stack operations are divided into ALU operations and immediate number operations. The immediate number operations are intended to push small numbers on the stack. The ALU operations are used for general purpose and for floating point calculations.

Data Move OperationsEdit

Data move operations are divided into load/store, address update and immediate offset operations. An immediate offset operation in one data move field is added to the computed address in the other data move field.

Flow Control OperationsEdit

Flow control operations divide into conditional branches, calls/jumps, counted loops, returns, and indirect calls/jumps.


6502 DLX 4stack
Date of release 1975 1995 Under development
Architecture Accumulator RISC VLIW and Stack
Internal data bus width 8 32
External data bus width 8 8
# of data registers 2 (X, Y) 30 GP, 32 FP 8 for each stack
# of other registers 1 accumulator, 1 program counter, 1 status register 31 special purpose registers SP and status register for each stack, 4 special purpose register, 32 memory access registers
Instruction lengths 2-7 bytes 4 bytes 8 bytes
# of instructions 56 64
Cycles per instructions 1; 2 if a page boundary is crossed 4-5
Pipeline 2 stage pipeline 5 stage pipeline 3 stage pipeline
Address bus width 16 bit 32 bit 32 or 64 bit
Endianness Little-Endian Big-Endian Little- and Big-Endian
Pins 40 73
Exceptions 14 8 64
Interrupts 1 non-masked interrupt, 1 IRQ, and software interrupt 16