Appendix E SSE and SSE2 Optimizations 355
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Appendix E SSE and SSE2 Optimizations
This appendix describes specific optimizations that can be utilized to improve performance when
using SSE and SSE2 instructions on AMD Athlon™ 64 and AMD Opteron™ processors.
Types of XMM-Register Data
The XMM registers (used by the SSE and SSE2 instructions) can hold the following three types of
data:
• Floating-point single-precision (FPS)
• Floating-point double-precision (FPD)
• Integer (INT)
Types of SSE and SSE2 Instructions
Most SSE and SSE2 instructions can be divided into five types according to the type of data they
produce and therefore expect to consume:
• Floating-point single-precision (FPS)
• Floating-point double-precision (FPD)
• Integer (INT)
• Load (produces data of type FPS, FPD, or INT)
• Store (can consume a register with data of any type)
This appendix covers the following topics:
Topic Page
Half-Register Operations 356
Zeroing Out an XMM Register 357
Reuse of Dead Registers 359
Moving Data Between XMM Registers and GPRs 360
Saving and Restoring Registers of Unknown Format 361
SSE and SSE2 Copy Loops 362
Data Conversion 364