Intel 253666-024US Computer Hardware User Manual


 
Vol. 2A 3-565
INSTRUCTION SET REFERENCE, A-M
MASKMOVDQU—Store Selected Bytes of Double Quadword
MASKMOVDQU—Store Selected Bytes of Double Quadword
Description
Stores selected bytes from the source operand (first operand) into an 128-bit
memory location. The mask operand (second operand) selects which bytes from the
source operand are written to memory. The source and mask operands are XMM
registers. The location of the first byte of the memory location is specified by DI/EDI
and DS registers. The memory location does not need to be aligned on a natural
boundary. (The size of the store address depends on the address-size attribute.)
The most significant bit in each byte of the mask operand determines whether the
corresponding byte in the source operand is written to the corresponding byte loca-
tion in memory: 0 indicates no write and 1 indicates write.
The MASKMOVEDQU instruction generates a non-temporal hint to the processor to
minimize cache pollution. The non-temporal hint is implemented by using a write
combining (WC) memory type protocol (see “Caching of Temporal vs. Non-Temporal
Data” in Chapter 10, of the Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Volume 1). Because the WC protocol uses a weakly-ordered memory consis-
tency model, a fencing operation implemented with the SFENCE or MFENCE instruc-
tion should be used in conjunction with MASKMOVEDQU instructions if multiple
processors might use different memory types to read/write the destination memory
locations.
Behavior with a mask of all 0s is as follows:
No data will be written to memory.
Signaling of breakpoints (code or data) is not guaranteed; different processor
implementations may signal or not signal these breakpoints.
Exceptions associated with addressing memory and page faults may still be
signaled (implementation dependent).
If the destination memory region is mapped as UC or WP, enforcement of
associated semantics for these memory types is not guaranteed (that is, is
reserved) and is implementation-specific.
The MASKMOVDQU instruction can be used to improve performance of algorithms
that need to merge data on a byte-by-byte basis. MASKMOVDQU should not cause a
read for ownership; doing so generates unnecessary bandwidth since data is to be
written directly using the byte-mask without allocating old data prior to the store.
Opcode Instruction 64-Bit
Mode
Compat/
Leg Mode
Description
66 0F F7 /r MASKMOVDQU
xmm1, xmm2
Valid Valid Selectively write bytes from xmm1 to
memory location using the byte mask in
xmm2. The default memory location is
specified by DS:EDI.