Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
3-16
SIMD Extensions 2 integer SIMD and __m128d is used for double
precision floating-point SIMD. These types enable the programmer to
choose the implementation of an algorithm directly, while allowing the
compiler to perform register allocation and instruction scheduling where
possible. These intrinsics are portable among all Intel architecture-based
processors supported by a compiler. The use of intrinsics allows you to
obtain performance close to the levels achievable with assembly. The
cost of writing and maintaining programs with intrinsics is considerably
less. For a detailed description of the intrinsics and their use, refer to the
Intel® C++ Compiler User’s Guide.
Example 3-10 shows the loop from Example 3-8 using intrinsics.
The intrinsics map one-to-one with actual Streaming SIMD Extensions
assembly code. The
xmmintrin.h header file in which the prototypes
for the intrinsics are defined is part of the Intel C++ Compiler included
with the VTune Performance Enhancement Environment CD.
Intrinsics are also defined for the MMX technology ISA. These are
based on the
__m64 data type to represent the contents of an mm register.
You can specify values in bytes, short integers, 32-bit values, or as a
64-bit object.
Example 3-10 Simple Four-Iteration Loop Coded with Intrinsics
#include <xmmintrin.h>
void add(float *a, float *b, float *c)
{
__m128 t0, t1;
t0 = _mm_load_ps(a);
t1 = _mm_load_ps(b);
t0 = _mm_add_ps(t0, t1);
_mm_store_ps(c, t0);
}