Intel IA-32 Computer Accessories User Manual


 
General Optimization Guidelines 2
2-5
Optimize Branch Predictability
Improve branch predictability and optimize instruction prefetching
by arranging code to be consistent with the static branch prediction
assumption: backward taken and forward not taken.
Avoid mixing near calls, far calls and returns.
Avoid implementing a call by pushing the return address and
jumping to the target. The hardware can pair up call and return
instructions to enhance predictability.
Use the pause instruction in spin-wait loops.
Inline functions according to coding recommendations.
Whenever possible, eliminate branches.
Avoid indirect calls.
Optimize Memory Access
Observe store-forwarding constraints.
Ensure proper data alignment to prevent data split across cache line.
boundary. This includes stack and passing parameters.
Avoid mixing code and data (self-modifying code).
Choose data types carefully (see next bullet below) and avoid type
casting.
Employ data structure layout optimization to ensure efficient use of
64-byte cache line size.
Favor parallel data access to mask latency over data accesses with
dependency that expose latency.
For cache-miss data traffic, favor smaller cache-miss strides to
avoid frequent DTLB misses.
Use prefetching appropriately.
Use the following techniques to enhance locality: blocking,
hardware-friendly tiling, loop interchange, loop skewing.