Intel 253666-024US Manual

A SERVICE OF

next previous

Vol. 2A 3-13

INSTRUCTION SET REFERENCE, A-M

• The __m128i data type can hold sixteen byte, eight word, or four doubleword, or

two quadword integer values.

The compiler aligns __m128, __m128d, and __m128i local and global data to

16-byte boundaries on the stack. To align integer, float, or double arrays, use the

declspec statement as described in Intel C/C++ compiler documentation. See

http://www.intel.com/support/performancetools/.

The __m128, __m128d, and __m128i data types are not basic ANSI C data types

and therefore some restrictions are placed on its usage:

• Use __m128, __m128d, and __m128i only on the left-hand side of an

assignment, as a return value, or as a parameter. Do not use it in other arithmetic

expressions such as “+” and “>>.”

• Do not initialize __m128, __m128d, and __m128i with literals; there is no way to

express 128-bit constants.

• Use __m128, __m128d, and __m128i objects in aggregates, such as unions (for

example, to access the float elements) and structures. The address of these

objects may be taken.

• Use __m128, __m128d, and __m128i data only with the intrinsics described in

this user’s guide. See Appendix C, “InteL® C/C++ Compiler Intrinsics and

Functional Equivalents,” in the Intel® 64 and IA-32 Architectures Software

Developer’s Manual, Volume 2B, for more information on using intrinsics.

The compiler aligns __m128, __m128d, and __m128i local data to 16-byte bound-

aries on the stack. Global __m128 data is also aligned on 16-byte boundaries. (To

align float arrays, you can use the alignment declspec described in the following

section.) Because the new instruction set treats the SIMD floating-point registers in

the same way whether you are using packed or scalar data, there is no __m32 data

type to represent scalar data as you might expect. For scalar operations, you should

use the __m128 objects and the “scalar” forms of the intrinsics; the compiler and the

processor implement these operations with 32-bit memory references.

The suffixes ps and ss are used to denote “packed single” and “scalar single” preci-

sion operations. The packed floats are represented in right-to-left order, with the

lowest word (right-most) being used for scalar operations: [z, y, x, w]. To explain

how memory storage reflects this, consider the following example.

The operation:

float a[4] ← { 1.0, 2.0, 3.0, 4.0 };

__m128 t ← _mm_load_ps(a);

Produces the same result as follows:

__m128 t ← _mm_set_ps(4.0, 3.0, 2.0, 1.0);

In other words:

t ← [ 4.0, 3.0, 2.0, 1.0 ]

Where the “scalar” element is 1.0.