Intel IA-32 Computer Accessories User Manual


 
Coding for SIMD Architectures 3
3-15
Assembly
Key loops can be coded directly in assembly language using an
assembler or by using inlined assembly (C-asm) in C/C++ code. The
Intel compiler or assembler recognize the new instructions and registers,
then directly generate the corresponding code. This model offers the
opportunity for attaining greatest performance, but this performance is
not portable across the different processor architectures.
Example 3-9 shows the Streaming SIMD Extensions inlined assembly
encoding.
Intrinsics
Intrinsics provide the access to the ISA functionality using C/C++ style
coding instead of assembly language. Intel has defined three sets of
intrinsic functions that are implemented in the Intel
®
C++ Compiler to
support the MMX technology, Streaming SIMD Extensions and
Streaming SIMD Extensions 2. Four new C data types, representing
64-bit and 128-bit objects are used as the operands of these intrinsic
functions.
__m64 is used for MMX integer SIMD, __m128 is used for
single-precision floating-point SIMD,
__m128i is used for Streaming
Example 3-9 Streaming SIMD Extensions Using Inlined Assembly Encoding
void add(float *a, float *b, float *c)
{
__asm {
mov eax, a
mov edx, b
mov ecx, c
movaps xmm0, XMMWORD PTR [eax]
addps xmm0, XMMWORD PTR [edx]
movaps XMMWORD PTR [ecx], xmm0
}
}