Computer Architecture Lab/Winter2006/PoettschacherRosenblattlWolf/ThreeMicroDiscussion
Comparing Atmel AVR, Alpha and Infineon TriCore ArchitectureEdit
AVR stands for „Advanced Virtual RISC“.
The AVR is a 8-bit RISC architecture which was designed by the 2 norwegian students Alf-Egil Bogen and Vegard Wollan.
It is generally used in microcontrollers.
The AVR uses a Harvard architecture with separate memories and buses for program and data.
It implements a single level pipelining. Most operations are performed in a single clock cylce.
Like other RISC architectures, the AVR offers 32 8-bit general purpose working registers (R0 – R31) with single cycle access time.
This allows single-cycle ALU operations. In a typical ALU operation, two operands are output from the Register File,
the operation is executed, and the result is stored back in the Register File – in one clock cycle.
Six of the general purpose working registers can be used as three 16-bit indirect address register pointers for Data Space addressing
The AVR offers conditional and unconditional jump and call instructions, with which the whole address space can be addressed directly.
The AVR’s instruction set is register-register type. It can be divided into 5 categories.
- ARITHMETIC AND LOGIC INSTRUCTIONS
- BRANCH INSTRUCTIONS
- DATA TRANSFER INSTRUCTIONS
- BIT AND BIT-TEST INSTRUCTIONS
- MCU CONTROL INSTRUCTIONS
Giving a total of 130 instructions.
The AVR uses two-address format, so the result of the operation performed overwrites one of the operands.
Most AVR instructions have a single 16-bit word format and most of them are executed in a single clock cycle.
General Purpose Register FileEdit
The Register File is optimized for the AVR Enhanced RISC instruction set.
To achieve the required performance and flexibility, the following input/output schemes are supported by the Register File:
- One 8-bit output operand and one 8-bit result input.
- Two 8-bit output operands and one 8-bit result input.
- Two 8-bit output operands and one 16-bit result input.
- One 16-bit output operand and one 16-bit result input.
Most of the instructions operating on the Register File have direct access to all registers.
DEC Alpha is a 64-Bit RISC load-store von Neumann architecture, used in PCs, Workstations and Servers until further development was cancelled in 2003. In contrast to the others, the Alpha architecture is not designed for microcontrollers, so it has no integrated peripherals (e.g. like timers/counters).
Like other non-embedded microprocessors, Alpha is a von Neumann architecture without separation of data and program bus. It has 29 general-purpose integer registers (R0-R28), 31 general-purpose floating-point registers (F0-F31), one data frame pointer register (R29) one stack pointer register (R30) and two special registers (R31 and F31) reading always as integer and floating-point zero. All registers and busses are 64 bit wide, allowing to address up to 16 exabyte of memory. The floating-point registers follow the IEEE 754-1985 format for single and double precision.
A feature of the Alpha architecture is the lack of a program status register. All instructions operate on registers only, allowing instruction parallelisation without the bottleneck of a single flag register. Together with 7-13 pipeline steps (version-dependent) and the ability of out-of-order execution in version EV6 and above, Alpha was one of the fastest systems available until further development was cancelled.
Every instruction is 32 Bits wide and exist in four flavours, described below:
|Flavour||31 ... 26||25 ... 21||20 ... 16||15 ... 5||4 ... 0|
The Priviledged Architecture Library (PAL) is an operating-system-dependent set of subroutines, callable by software or hardware.
Depending on the value of the register Ra, a branch is executed according to the displacement parameter.
One of the registers Ra or Rb is the base address of the memory accessed, possibly extendend by the Displacement. The other register is either loaded with the value of the memory address or stored to the memory address.
Alpha uses three-address format, so the three registers Ra, Rb and Rc are accessed. All registers used have to be either integer registers or floating-point registers.
The Infineon TriCore architecture is very complex and has a huge feature set, which includes
- 32-bit architecture
- RISC with DSP instructions
- Little-endian byte ordering
- 4-GByte virtual or physical data, program, and input/output address spaces
- Full-featured memory management system
- Memory protection
- 16-/32-bit instructions for reduced code size
- Fast automatic context switching between two tasks
- Multiply-accumulate unit
- Saturating integer arithmetic
- Bit handling
- Byte and bit addressing
- Packed data operations
- Zero-overhead loop
- Low interrupt latency
- Flexible interrupt prioritization scheme
- Flexible power management
Currently, there exist two versions of the TriCore architecture, we will deal with version 1.3, which is also used in the current microcontrollers like the TC1796.
All TriCore microcontrollers have large on-chip memory blocks of RAM, ROM, DRAM, OTP, FLASH of different types.
The architecture is mainly a harvard architecture, although the busses are not strictly separated and have bridges for flexible data exchange (with performance penalty). As example we looked at the TC1796, the biggest and most powerful microcontroller of the TriCore family.
The CPU has a 64 bit wide bus to the program memory interface (PMI) which has 48 kb of scratch pad ram and 16 kb of instruction cache which runs with full cpu speed (up to 150 MHz). Both memories are optionally parity protected. Over the program local memory bus the PMI is connected to the program memory unit as well as to the data memory unit and the external bus unit. The program local memory bus also runs at full cpu speed and is 64 bit wide.
The program memory unit hast 64 bit wide access to the 2 MB program flash as well as to the 128 KB data flash, 8 KB boot ROM and 8 KB test ROM. All flash ROMs are ECC protected using 8 ECC bits for each block of 64 bits of data, enabling correction of one bit error and detection of two bit errors per block.
Two 64 bit wide buses connect the CPU with the data memory interface (DMI) which has 56 KB of local data RAM and 8 KB of dual ported ram which is connected to the second master channel of the onboard DMA controller. Both memories are optionally parity protected. Via the data local memory bus, the DMI can access the data memory unit which has 64 KB of SRAM and 16 KB of Stand-by RAM. Also, both are optionally parity protected.
Internal peripherals are connected to the system peripherals bus and can be accessed either by the CPU slave interface or the data local memory bus over the LFI-Bridge. Both, the interface and the bridge can be master of the bus as well as the first master channel of the onboard DMA controller. The second master channel of the DMA controller is connected to the remote peripheral bus which connects additional onboard peripherals and the dual ported ram of the DMI. The system peripheral bus and the remote peripheral bus run with the system clock, which runs at the same speed or at a fraction of the CPU speed. The maximum speed is 75 MHz.
The instruction set splits up into eight categories
- Arithmetic (Integer, DSP and SIMD Packed Arithmetic)
- Bit Manipulation
- 16-Bit Subset
- Address Arithmetic and Address Comparison
with a total of more than 150 instructions.
Most instructions are 32 bit wide, some are 16 bit. Most instructions have two or three operands.
The CPU has 16 32 bit data registers and 16 32 bit address registers as well as three status and program counter registers. All registers are also refered to be the context of the running task and can be saved to and loaded from the local data ram for context switching. There exist shadow registers to enable fast context switching.
Most instructions are executed within one CPU cycle, some within 2 or 3 cycles. Branches are executed within 1, 2 or 3 cycles, depending on the branch prediction.
Instructions are fetched by the instruction fetch unit which directs the instruction to the appropriate pipeline. The three pipelines are:
- Integer pipeline
- Load/store pipeline
- Loop pipeline
The integer and the load/store pipeline have four stages (fetch, decode, execute and write-back), the loop pipeline has two stages (decode and write-back). All pipelines work in parallel, enabling three instructions to be executed within one clock cycle.