Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-92
Spill Scheduling
The spill scheduling algorithm used by a code generator will be
impacted by the Pentium 4 processor memory subsystem. A spill
scheduling algorithm is an algorithm that selects what values to spill to
memory when there are too many live values to fit in registers. Consider
the code in Example 2-26, where it is necessary to spill either
A, B, or C.
For the Pentium 4 processor, using dependence depth information in
spill scheduling is even more important than in previous processors. The
loop- carried dependence in
A makes it especially important that A not be
spilled. Not only would a store/load be placed in the dependence chain,
but there would also be a data-not-ready stall of the load, costing further
cycles.
Assembly/Compiler Coding Rule 62. (H impact, MH generality) For small
loops, placing loop invariants in memory is better than spilling loop-carried
dependencies.
A possibly counter-intuitive result: in such a situation it is better to put
loop invariants in memory than in registers, since loop invariants never
have a load blocked by store data that is not ready.
Scheduling Rules for the Pentium 4 Processor Decoder
The Pentium 4 and Intel Xeon processors have a single decoder that can
decode instructions at the maximum rate of one instruction per clock.
Complex instructions must enlist the help of the microcode ROM; see
Chapter 1, “IA-32 Intel® Architecture Processor Family Overview” for
details.
Example 2-26 Spill Scheduling Example Code
LOOP
C := ...
B := ...
A := A + ...