IA-32 Intel® Architecture Processor Family Overview
1-43
that the cache line that contains the memory location is owned by the
first-level data cache of the initiating core (that is, the line is in
exclusive or modified state). Then the processor looks for the cache line
in the cache and memory sub-systems. The look-ups for the locality of
load or store operation are in the following order:
1. First level cache of the initiating core
2. Second-level cache and the first-level cache of the other core
3. Memory
Table 1-5 lists the performance characteristics of generic load and store
operations in an Intel Core Duo processor.
Numeric values of Table 1-5
are
in terms of processor core cycles.
Throughput is expressed as the number of cycles to wait before the
same operation can start again. The latency of a bus transaction is
exposed in some of these operations, as indicated by entries
containing “+ bus transaction”. On Intel Core Duo processors, a
typical bus transaction may take 5.5 bus cycles. For a 667 MHz bus
and a core frequency of 2.167GHz, the total of 14 + 5.5 * 2167
/(667/4) ~ 86 core cycles.
Sometimes a modified cache line has to be evicted to make room for a
new cache line. The modified cache line is evicted in parallel to
bringing in new data and does not require additional latency. However,
Table 1-5 Characteristics of Load and Store Operations
in Intel Core Duo Processors
Data Locality
Load Store
Latency Throughput Latency Throughput
1st-level cache (L1) 3 1 2 1
L1 of the other core in
“Modified” state
14 + bus
transaction
14 + bus
transaction
14 + bus
transaction
~10
2nd-level cache 14 <6 14 <6
Memory 14 + bus
transaction
Bus read
protocol
14 + bus
transaction
Bus write
protocol