General Optimization Guidelines 2
2-3
* Streaming SIMD Extensions (SSE)
** Streaming SIMD Extensions 2 (SSE2)
General Practices and Coding Guidelines
This section discusses guidelines derived from the performance factors
listed in the “Tuning to Achieve Optimum Performance” section. It also
highlights practices that use performance tools.
The majority of these guidelines benefit processors based on the Intel
NetBurst microarchitecture and the Pentium M processor
microarchitecture. Some guidelines benefit one microarchitecture more
than the other. As a whole, these coding rules enable software to be
optimized for the common performance features of the Intel NetBurst
microarchitecture and the Pentium M processor microarchitecture.
The coding practices recommended under each heading and the bullets
under each heading are listed in order of importance.
Cache line splits Access across
cache line
boundary
Example 2-11 Align data on natural
operand size address
boundaries. If the
data will be accesses
with vector instruction
loads and stores,
align the data on 16
byte boundaries.
Denormal inputs and
outputs
Slows x87, SSE*,
SSE2** floating-
point operations
Floating-point
Exceptions
Cycling more than 2
values of Floating-point
Control Word
fldcw not
optimized
Floating-point Modes
Table 2-1 Coding Pitfalls Affecting Performance (continued)
Factors Affecting
Performance Symptom
Example
(if applicable) Section Reference