Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-2
The following sections describe practices, tools, coding rules and
recommendations associated with these factors that will aid in
optimizing the performance on IA-32 processors.
Tuning to Prevent Known Coding Pitfalls
To produce program code that takes advantage of the Intel NetBurst
microarchitecture and the Pentium M processor microarchitecture, you
must avoid the coding pitfalls that limit the performance of the target
processor family. This section lists several known pitfalls that can limit
performance of Pentium 4 and Intel Xeon processor implementations.
Some of these pitfalls, to a lesser degree, also negatively impact
Pentium M processor performance (store-to-load-forwarding
restrictions, cache-line splits).
Table 2-1 lists coding pitfalls that cause performance degradation in
some Pentium 4 and Intel Xeon processor implementations. For every
issue, Table 2-1 references a section in this document. The section
describes in detail the causes of the penalty and presents a
recommended solution. Note that “aligned” here means that the address
of the load is aligned with respect to the address of the store.
Table 2-1 Coding Pitfalls Affecting Performance
Factors Affecting
Performance Symptom
Example
(if applicable) Section Reference
Small, unaligned
load
after large store
Store-forwarding
blocked
Example 2-12 Store Forwarding,
Store-to-Load-Forwar
ding Restriction on
Size and Alignment
Large
load after small
store;
Load
dword after store
dword, store byte;
Load dword, AND with
0xff
after store byte
Store-forwarding
blocked
Example 2-13,
Example 2-14
Store Forwarding,
Store-to-Load-Forwar
ding Restriction on
Size and Alignment
continued