Intel IA-32 Computer Accessories User Manual


 
Optimizing Cache Usage 6
6-15
The maskmovq/maskmovdqu (non-temporal byte mask store of packed
integer in an MMX technology or Streaming SIMD Extensions register)
instructions store data from a register to the location specified by the
edi register. The most significant bit in each byte of the second mask
register is used to selectively write the data of the first register on a
per-byte basis. The instruction is implicitly weakly-ordered (that is,
successive stores may not write memory in original program-order),
does not write-allocate, and thus minimizes cache pollution.
The fence Instructions
The following fence instructions are available: sfence, lfence, and
mfence.
The sfence Instruction
The sfence (store fence) instruction makes it possible for every
store instruction that precedes the sfence instruction in program order
to be globally visible before any
store instruction that follows the
sfence. The sfence instruction provides an efficient way of ensuring
ordering between routines that produce weakly-ordered results.
The use of weakly-ordered memory types can be important under
certain data sharing relationships, such as a producer-consumer
relationship. Using weakly-ordered memory can make assembling the
data more efficient, but care must be taken to ensure that the consumer
obtains the data that the producer intended to see. Some common usage
models may be affected in this way by weakly-ordered stores. Examples
are:
library functions, which use weakly-ordered memory to write
results
compiler-generated code, which also benefits from writing
weakly-ordered results
hand-crafted code