Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

IA-32 Intel® Architecture Optimization

6-22

Example of Latency Hiding with S/W Prefetch Instruction

Achieving the highest level of memory optimization using prefetch

instructions requires an understanding of the microarchitecture and

system architecture of a given machine. This section translates the key

architectural implications into several simple guidelines for

programmers to use.

Figure 6-2 and Figure 6-3 show two scenarios of a simplified 3D

geometry pipeline as an example. A 3D-geometry pipeline typically

fetches one vertex record at a time and then performs transformation

and lighting functions on it. Both figures show two separate pipelines,

an execution pipeline, and a memory pipeline (front-side bus).

Since the Pentium 4 processor, similarly to the Pentium II and

Pentium III processors, completely decouples the functionality of

execution and memory access, these two pipelines can function

concurrently. Figure 6-2 shows “bubbles” in both the execution and

memory pipelines. When loads are issued for accessing vertex data, the

Figure 6-1 Effective Latency Reduction as a Function of Access Stride

Upperbound of Pointer-Chasing Latency Reduction

0%

20%

40%

60%

80%

100%

120%

64

8

0

96

1

12

128

1

44

1

60

176

1

92

2

08

2

24

240

Stride (Bytes)

Effective Latency Reduction

Fam.15; Model 3, 4

Fam.15; Model 0,1,2

Fam. 6; Model 13

Fam. 6; Model 14

Fam. 15; Model 6

previous next