IA-32 Intel® Architecture Optimization
C-2
Overview
The current generation of IA-32 family of processors use out-of-order
execution with dynamic scheduling and buffering to tolerate poor
instruction selection and scheduling that may occur in legacy code. It
can reorder μops to cover latency delays and to avoid resource conflicts.
In some cases, the microarchitecture’s ability to avoid such delays can
be enhanced by arranging IA-32 instructions. While reordering IA-32
instructions may help, the execution core determines the final schedule
of μops.
This appendix provides information to assembly language programmers
and compiler writers, to aid in selecting the sequence of instructions
which minimizes dependency chain latency, and to arrange instructions
in an order which assists the hardware in processing instructions
efficiently while avoiding resource conflicts. The performance impact
of applying the information presented in this appendix has been shown
to be on the order of several percent, for applications which are not
completely dominated by other performance factors, such as:
• cache miss latencies
• bus bandwidth
• I/O bandwidth
Instruction selection and scheduling matters when the compiler or
assembly programmer has already addressed the performance issues
discussed in Chapter 2:
• observe store forwarding restrictions
• avoid cache line and memory order buffer splits
• do not inhibit branch prediction
• minimize the use of xchg instructions on memory locations