General Optimization Guidelines 2
2-13
• On the Pentium 4 and Intel Xeon processors, the primary code size
limit of interest is imposed by the trace cache. On Pentium M
processors, code size limit is governed by the instruction cache.
• There may be a penalty when instructions with immediates
requiring more than 16-bit signed representation are placed next to
other instructions that use immediates.
Note that memory-related optimization techniques for alignments,
complying with store-to-load-forwarding restrictions and avoiding data
splits help Pentium 4 processors as well as Pentium M processors.
CPUID Dispatch Strategy and Compatible Code Strategy
Where optimum performance on all processor generations is desired,
applications can take advantage of
cpuid to identify the processor
generation and integrate processor-specific instructions (such as SSE2
instructions) into the source code. The Intel C++ Compiler supports the
integration of different versions of the code for different target
processors. The selection of which code to execute at runtime is made
based on the CPU identifier that is read with
cpuid. Binary code
targeted for different processor generations can be generated under the
control of the programmer or by the compiler.
For applications run on both the Intel Pentium 4 and Pentium M
processors, and where minimum binary code size and single code path
is important, a compatible code strategy is the best. Optimizing
applications for the Intel NetBurst microarchitecture is likely to improve
code efficiency and scalability when running on processors based on
current and future generations of IA-32 processors. This approach to
optimization is also likely to deliver high performance on Pentium M
processors.