AMD 250 Computer Hardware User Manual


 
Chapter 9 Optimizing with SIMD Instructions 195
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
9.1 Ensure All Packed Floating-Point Data are Aligned
Optimization
Align all packed floating-point data on 16-byte boundaries.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
Misaligned memory accesses reduce the available memory bandwidth and SSE and SSE2 instructions
have shorter latencies when operating on aligned memory operands.
Aligning data on 16-byte boundaries allows you to use the aligned load instructions (MOVAPS,
MOVAPD, and MOVDQA), which move through the floating-point unit with shorter latencies and
reduce the possibility of stalling addition or multiplication instructions that are dependent on the load
data.