Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-12
Intel Core Solo and Intel Core Duo processors have enhanced front
end that is less sensitive to the 4-1-1 template. The practice has no
real impact on processors based on the Intel NetBurst
microarchitecture.
Dependencies for partial register writes incur large penalties when
using the Pentium M processor (this applies to processors with
CPUID signature family 6, model 9). On Pentium 4, Intel Xeon
processors, Pentium M processor (with CPUID signature family 6,
model 13), and Intel Core Solo, and Intel Core Duo processors, such
penalties are resolved by artificial dependencies between each
partial register write. To avoid false dependences from partial
register updates, use full register updates and extended moves.
On Pentium 4 and Intel Xeon processors, some latencies have
increased: shifts, rotates, integer multiplies, and moves from
memory with sign extension are longer than before. Use care when
using the
lea instruction. See the section “Use of the lea
Instruction” for recommendations.
The inc and dec instructions should always be avoided. Using add
and
sub instructions instead avoids data dependence and improves
performance.
Dependence-breaking support is added for the pxor instruction.
Floating point register stack exchange instructions were free; now
they are slightly more expensive due to issue restrictions.
Writes and reads to the same location should now be spaced apart.
This is especially true for writes that depend on long-latency
instructions.
Hardware prefetching may shorten the effective memory latency for
data and instruction accesses.
Cacheability instructions are available to streamline stores and
manage cache utilization.
Cache lines are 64 bytes (see Table 1-1 and Table 1-3). Because of
this, software prefetching should be done less often. False sharing,
however, can be an issue.