IA-32 Intel® Architecture Optimization
E-6
No Preloading or Prefetch
The traditional programming approach does not perform data
preloading or prefetch. It is sequential in nature and will experience
stalls because the memory is unable to provide the data immediately
when the execution pipeline requires it. Examine Figure E-2.
As you can see from Figure E-2, the execution pipeline is stalled while
waiting for data to be returned from memory. On the other hand, the
front side bus is idle during the computation portion of the loop. The
memory access latencies could be hidden behind execution if data could
be fetched earlier during the bus idle time.
Further analyzing Figure E-2,
• assume execution cannot continue till last chunk returned and
• δ
f
indicates flow data dependency that stalls the execution pipelines
With these two things in mind the iteration latency (il) is computed as
follows:
Figure E-2 Execution Pipeline, No Preloading or Prefetch
Execution cycles
Execution
pipeline
(i+1)
th
iteration
T
l
T
b
δ
f
T
c
-
T
Δ
T
Δ
T
l
T
b
δ
f
T
Δ
T
c
-
T
Δ
Front-Side
Bus
i
th
iteration
issue loads
issue loads
Execution units idle
Execution units idle
FSB idle
il T
c
T+
l
T
b
+≅