Chapter 9 Optimizing with SIMD Instructions 199
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
9.4 Use MOVAPD and MOVAPS Instead of MOVUPD
and MOVUPS
Optimization
For best performance use the aligned versions of these instructions when using a memory operand.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
Both MOVUPS and MOVUPD are VectorPath instructions when one of the operands is a memory
location. It is better to use MOVAPS and MOVAPD since they are both DirectPath Double decode
types. Misaligned memory accesses also reduce the available memory bandwidth and SSE and SSE2
instructions have shorter latencies when operating on aligned memory operands. Aligning data on 16-
byte boundaries allows you to use the aligned load instructions (MOVAPS, MOVAPD, and
MOVDQA), which move through the floating-point unit with shorter latencies and reduce the
possibility of stalling addition or multiplication instructions that are dependent on the load data.