Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-100
order engine. When tuning, note that all IA-32 based processors have very
high branch prediction rates. Consistently mispredicted are rare. Use
these instructions only if the increase in computation time is less than the
expected cost of a mispredicted branch. 2-16
Assembly/Compiler Coding Rule 3. (M impact, H generality) Arrange
code to be consistent with the static branch prediction algorithm: make
the fall-through code following a conditional branch be the likely target
for a branch with a forward target, and make the fall-through code
following a conditional branch be the unlikely target for a branch with a
backward target. 2-19
Assembly/Compiler Coding Rule 4. (MH impact, MH generality)
Near calls must be matched with near returns, and far calls must be
matched with far returns. Pushing the return address on the stack and
jumping to the routine to be called is not recommended since it creates a
mismatch in calls and returns. 2-21
Assembly/Compiler Coding Rule 5. (MH impact, MH generality)
Selectively inline a function where doing so decreases code size, or if the
function is small and the call site is frequently executed. 2-22
Assembly/Compiler Coding Rule 6. (H impact, M generality) Do not
inline a function if doing so increases the working set size beyond what
will fit in the trace cache. 2-22
Assembly/Compiler Coding Rule 7. (ML impact, ML generality) If
there are more than 16 nested calls and returns in rapid succession,
consider transforming the program, for example, with inline, to reduce the
call depth. 2-22
Assembly/Compiler Coding Rule 8. (ML impact, ML generality)
Favor inlining small functions that contain branches with poor prediction
rates. If a branch misprediction results in a RETURN being prematurely
predicted as taken, a performance penalty may be incurred. 2-22
Assembly/Compiler Coding Rule 9. (L impact, L generality) If the last
statement in a function is a call to another function, consider converting
the call to a jump. This will save the call/ return overhead as well as an
entry in the return stack buffer. 2-22