Optimizing for SIMD Integer Applications 4
4-5
• Don’t empty when already empty: If the next instruction uses an
MMX register,
_mm_empty() incurs a cost with no benefit.
• Group Instructions: Try to partition regions that use x87 FP
instructions from those that use 64-bit SIMD integer instructions.
This eliminates needing an
emms instruction within the body of a
critical loop.
• Runtime initialization: Use _mm_empty() during runtime
initialization of
__m64 and x87 FP data types. This ensures
resetting the register between data type transitions. See Example 4-1
for coding usage.
Further, you must be aware that your code generates an MMX
instruction, which uses the MMX registers with the Intel C++ Compiler,
in the following situations:
• when using a 64-bit SIMD integer intrinsic from MMX technology,
SSE, or SSE2
• when using a 64-bit SIMD integer instruction from MMX
technology, SSE, or SSE2 through inline assembly
• when referencing an __m64 data type variable
Additional information on the x87 floating-point programming model
can be found in the IA-32 Intel® Architecture Software Developer’s
Manual, Volume 1. For more documentation on
emms, visit
h
ttp://developer.intel.com.
Example 4-1 Resetting the Register between __m64 and FP Data Types
Incorrect Usage Correct Usage
__m64 x = _m_paddd(y, z); __m64 x = _m_paddd(y, z);
float f = init(); float f = (_mm_empty(), init());