Optimizing Cache Usage 6
6-31
Figure 6-5 Memory Access Latency and Execution With Prefetch
2 Load streams, 1 store stream
50
100
150
200
250
300
350
54 108 144 192 240 336 390
Computations per loop
Effective loop latency
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
% of Bus Utilized
16 32 64 128 none
% Bus Utilization
One load and one store stream
0
50
100
150
200
250
300
350
48 108 144 192 240 336 408
Computations per loop
Effective loop latency
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
% of Bus Utilization
16_por 32_por 64_por 128_por None_por
% Bus Utilization