Vol. 2A 3-657
INSTRUCTION SET REFERENCE, A-M
MOVNTPS—Store Packed Single-Precision Floating-Point Values Using Non-Temporal
Hint
MOVNTPS—Store Packed Single-Precision Floating-Point Values Using
Non-Temporal Hint
Description
Moves the double quadword in the source operand (second operand) to the destina-
tion operand (first operand) using a non-temporal hint to minimize cache pollution
during the write to memory. The source operand is an XMM register, which is
assumed to contain four packed single-precision floating-point values. The destina-
tion operand is a 128-bit memory location.
The non-temporal hint is implemented by using a write combining (WC) memory
type protocol when writing the data to memory. Using this protocol, the processor
does not write the data into the cache hierarchy, nor does it fetch the corresponding
cache line from memory into the cache hierarchy. The memory type of the region
being written to can override the non-temporal hint, if the memory address specified
for the non-temporal store is in an uncacheable (UC) or write protected (WP)
memory region. For more information on non-temporal stores, see “Caching of
Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architec-
tures Software Developer’s Manual, Volume 1.
Because the WC protocol uses a weakly-ordered memory consistency model, a
fencing operation implemented with the SFENCE or MFENCE instruction should be
used in conjunction with MOVNTPS instructions if multiple processors might use
different memory types to read/write the destination memory locations.
In 64-bit mode, use of the REX.R prefix permits this instruction to access additional
registers (XMM8-XMM15).
Operation
DEST ← SRC;
Intel C/C++ Compiler Intrinsic Equivalent
MOVNTDQ void _mm_stream_ps(float * p, __m128 a)
SIMD Floating-Point Exceptions
None.
Opcode Instruction 64-Bit
Mode
Compat/
Leg Mode
Description
0F 2B /r MOVNTPS m128,
xmm
Valid Valid Move packed single-precision floating-
point values from xmm to m128 using
non-temporal hint.