AMD 250 Computer Hardware User Manual


 
Chapter 9 Optimizing with SIMD Instructions 197
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
The statement movlps xmm1,
mem64
marks the lower half of XMM1 as FPS (floating-point
single-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any
instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half
is not also in FPS format. Examples of instructions that expect the full 128 bits of XMM1 to be in
FPS format are MOVAPS, ANDPS, ANDNPS, and ORPS. For more information on XMM-
register data types, see “Half-Register Operations” on page 356.
Rational—Double Precision
The MOVLPD instruction does not necessitate clearing the upper 64 bits of an XMM register, as the
MOVSD/MOVQ instructions do, upon loading 64 bits of floating-point data into the lower 64 bits of
the XMM register. Using the MOVLPD instruction can significantly increase performance on
processor-limited SSE2 scalar floating-point-intensive code.
Consider the following caveat when using the MOVLPD instruction:
The statement movlpd xmm1,
mem64
marks the lower half of XMM1 as FPD (floating-point
double-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any
instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half
is not also in FPD format. Examples of instructions that expect the full 128 bits of XMM1 to be in
FPD format are ANDPD, ANDNPD, and ORPD. For more information on XMM-register data
types, see “Half-Register Operations” on page 356.