Support User Manuals

AMD 250 Computer Hardware User Manual

Open as PDF

of 384

Chapter 9 Optimizing with SIMD Instructions 197

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

• The statement movlps xmm1,

mem64

marks the lower half of XMM1 as FPS (floating-point

single-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any

instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half

is not also in FPS format. Examples of instructions that expect the full 128 bits of XMM1 to be in

FPS format are MOVAPS, ANDPS, ANDNPS, and ORPS. For more information on XMM-

register data types, see “Half-Register Operations” on page 356.

Rational—Double Precision

The MOVLPD instruction does not necessitate clearing the upper 64 bits of an XMM register, as the

MOVSD/MOVQ instructions do, upon loading 64 bits of floating-point data into the lower 64 bits of

the XMM register. Using the MOVLPD instruction can significantly increase performance on

processor-limited SSE2 scalar floating-point-intensive code.

Consider the following caveat when using the MOVLPD instruction:

• The statement movlpd xmm1,

mem64

marks the lower half of XMM1 as FPD (floating-point

double-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any

instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half

is not also in FPD format. Examples of instructions that expect the full 128 bits of XMM1 to be in

FPD format are ANDPD, ANDNPD, and ORPD. For more information on XMM-register data

types, see “Half-Register Operations” on page 356.

previous next