IA-32 Intel® Architecture Optimization
6-12
evicting data from all processor caches). The Pentium M
processor implements a combination of both approaches.
If the streaming store hits a line that is present in the first-level
cache, the store data is combined in place within the first-level
cache. If the streaming store hits a line present in the
second-level, the line and stored data is flushed from the
second-level to system memory.
2. If the data is not present in the cache hierarchy and the destination
region is mapped as
WB or WT; the transaction will be weakly ordered
and is subject to all
WC memory semantics. This non-temporal store
will not write-allocate. Different implementations may choose to
collapse and combine such stores.
Write-Combining
Generally, WC semantics require software to ensure coherence, with
respect to other processors and other system agents (such as graphics
cards). Appropriate use of synchronization and a fencing operation (see
“The fence Instructions” later in this chapter) must be performed for
producer-consumer usage models. Fencing ensures that all system
agents have global visibility of the stored data; for instance, failure to
fence may result in a written cache line staying within a processor, and
the line would not be visible to other agents.
For processors which implement non-temporal stores by updating data
in-place that already resides in the cache hierarchy, the destination
region should also be mapped as
WC. Otherwise if mapped as WB or WT,
there is a potential for speculative processor reads to bring the data into
the caches; in this case, non-temporal stores would then update in place,
and data would not be flushed from the processor by a subsequent
fencing operation.
The memory type visible on the bus in the presence of memory type
aliasing is implementation-specific. As one possible example, the
memory type written to the bus may reflect the memory type for the first
store to this line, as seen in program order; other alternatives are