This page is a forum for student discussion on the Limit of ILP.

Paper Edit

Base the discussion on following paper (till page 35): Limits of instruction-level parallelism by David Wall, Nov 1993

Discussion Edit

You can sign your discussion Schoeberl 12:50, 9 November 2006 (UTC)

What is the maximum ILP seen? Under which assumptions? Edit

The maximal quoted parallelism in the paper is 1000 which considered highly numerical programs and a machine with unlimited parallelism and an omniscent scheduler. In the study performed by the author the highest seen is 500 for swm256 in the perfect model with unlimited window and cycle size[WRL-93-6-1, p27]. In realistic model this value drops to around 50 as peek performance and 10 for the mean[WRL-93-6-1, p27]. CWalter

What is the most important technique for ILP? Edit

Or in other words: What feature would really hurt ILP when implemented not very well?

Jump prediction and specially return prediction has a great impact with a mean of 10 for parallelism. It is also easy to implement (at least in the author's opinion). Branch prediction is also very promising yielding a gain of four with only a small 4-Bit table.

What is the difference between window size and cycle width? Edit

The windows size is the set of instructions which is examined for simultaneous exection. The cycle width limits the number of instructions which can be scheduled, i.e. executed. Therefore if the window size is 2k we look at 2048 instructions. Assume we have found 111 instructions which can be parallelized a cycle width of 64 would limit actual parallelism to 64. CWalter

Did loop unrolling help? Edit

Isn't ILP the dynamic version of loop unrolling in hardware?

Loop unrolling is not the best technique. Looking at the results shown in WRL-93-6 we see that only a few programs performed better [WRL-93-6 , p24]. In some cases loop unrolling did also hurt parallelism. In fact a good branch prediction could replace loop unrolling. The only case where loop unrolling will perform better is when the compiler knows the exact bounds because in this case it could generate the perfect code. Another problem with loop unrolling is that registers are limited and depeding upon the selected depth some registers might have to be saved before entering the loop. This would undo some performance gains.CWalter

What is available today? Edit

The paper was written 1993, 22 years from now. How far did we get with ILP today?

Are there limits in the paper? Edit

How does ILP affect determinism and asynchronous event performance? Edit

If multiple instructions are issued does the processor need to finish all these instructions before handling an external interrupts? And if not what where is the flow interrupted? Does this mean that a processor with an average parallelism of 10 and a cycle time of 10ns has the same interrupt performance as a processor with a parallelism of 1 (none) and a cycle time of 100ns? CWalter