General Optimization Guidelines 2
2-5
Optimize Branch Predictability
• Improve branch predictability and optimize instruction prefetching
by arranging code to be consistent with the static branch prediction
assumption: backward taken and forward not taken.
• Avoid mixing near calls, far calls and returns.
• Avoid implementing a call by pushing the return address and
jumping to the target. The hardware can pair up call and return
instructions to enhance predictability.
• Use the pause instruction in spin-wait loops.
• Inline functions according to coding recommendations.
• Whenever possible, eliminate branches.
• Avoid indirect calls.
Optimize Memory Access
• Observe store-forwarding constraints.
• Ensure proper data alignment to prevent data split across cache line.
boundary. This includes stack and passing parameters.
• Avoid mixing code and data (self-modifying code).
• Choose data types carefully (see next bullet below) and avoid type
casting.
• Employ data structure layout optimization to ensure efficient use of
64-byte cache line size.
• Favor parallel data access to mask latency over data accesses with
dependency that expose latency.
• For cache-miss data traffic, favor smaller cache-miss strides to
avoid frequent DTLB misses.
• Use prefetching appropriately.
• Use the following techniques to enhance locality: blocking,
hardware-friendly tiling, loop interchange, loop skewing.