General Optimization Guidelines 2
2-33
If a variable is known not to change between when it is stored and when
it is used again, the register that was stored can be copied or used
directly. If register pressure is too high, or an unseen function is called
before the store and the second load, it may not be possible to eliminate
the second load.
Assembly/Compiler Coding Rule 17. (H impact, M generality) Pass
parameters in registers instead of on the stack where possible. Passing
arguments on the stack is a case of store followed by a reload. While this
sequence is optimized in IA-32 processors by providing the value to the load
directly from the memory order buffer without the need to access the data
cache, floating point values incur a significant latency in forwarding. Passing
floating point argument in (preferably XMM) registers should save this long
latency operation.
Parameter passing conventions may limit the choice of which
parameters are passed in registers versus on the stack. However, these
limitations may be overcome if the compiler has control of the
compilation of the whole binary (using whole-program optimization).
Store-to-Load-Forwarding Restriction on Size and
Alignment
Data size and alignment restrictions for store-forwarding apply to the
Pentium 4, Intel Xeon and Pentium M processor. The performance
penalty from violating store-forwarding restrictions is less for
Pentium M processors than that for Pentium 4 processors.
This section describes these restrictions in all cases. It prescribes
recommendations to prevent the non-forwarding penalty. Fixing this
problem for Pentium 4 and Intel Xeon processors also fixes problem on
Pentium M processors.