Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
6-10
The Non-temporal Store Instructions
This section describes the behavior of streaming stores and reiterates
some of the information presented in the previous section. In Streaming
SIMD Extensions, the
movntps, movntpd, movntq, movntdq, movnti,
maskmovq
and maskmovdqu instructions are streaming, non-temporal
stores. With regard to memory characteristics and ordering, they are
similar mostly to the Write-Combining (
WC) memory type:
Write combining – successive writes to the same cache line are
combined
Write collapsing – successive writes to the same byte(s) result in
only the last write being visible
Weakly ordered – no ordering is preserved between WC stores, or
between
WC stores and other loads or stores
Uncacheable and not write-allocating – stored data is written around
the cache and will not generate a read-for-ownership bus request for
the corresponding cache line
Fencing
Because streaming stores are weakly ordered, a fencing operation is
required to ensure that the stored data is flushed from the processor to
memory. Failure to use an appropriate fence may result in data being
“trapped” within the processor and will prevent visibility of this data by
other processors or system agents. WC stores require software to ensure
coherence of data by performing the fencing operation; see “The fence
Instructions” section for more information.
Streaming Non-temporal Stores
Streaming stores can improve performance in the following ways:
Increase store bandwidth if 64 bytes that fit within a cache line are
written consecutively, since they do not require read-for-ownership
bus requests and 64 bytes are combined into a single bus write
transaction.