Intel Processor Computer Hardware User Manual

Open as PDF

of 289

Developer’s Manual March, 2003 B-27

Intel

80200 Processor based on Intel

XScale

™

Microarchitecture

Optimization Guide

B.4.4.2. Prefetch Loop Scheduling

When adding prefetch to a loop which operates on arrays, it may be advantages to prefetch ahead

one, two, or more iterations. The data for future iterations is located in memory by a fixed offset

from the data for the current iteration. This makes it easy to predict where to fetch the data. The

number of iterations to prefetch ahead is refereed to as the prefetch scheduling distance or psd. For

the Intel

80200 processor this can be calculated as:

Where:

pref

Is the number of cache lines to be prefetched for both reading and writing.

evict

Is the number of cache half line evictions caused by the loop.

inst

Is the number of instructions executed in one iteration of the loop

hwlinexfer

This is the number of core clocks required to write half a cache line as would happen if

only one of the cache line dirty bits were set when a line eviction occurred. For the

Intel

80200 processor this takes 2 bus clocks or 12 core clocks.

CPI This is the average number of core clocks per instruction.

The psd number provided by the above equation is a good starting point, but may not be the most

ideal consideration. Estimating N

evict

is very difficult from static code. However, if the operational

data uses the mini-data cache and if the loop operations should overflow the mini-data cache, then

a first order estimate of N

evict

would be the number of bytes written pre loop iteration divided by a

half cache line size of 16 bytes. Cache overflow can be estimated by the number of cache lines

transferred each iteration and the number of expected loop iterations. N

evict

and CPI can be

estimated by profiling the code using the performance monitor “cache write-back” event count.

B.4.4.3. Prefetch Loop Limitations

It is not always advantages to add prefetch to a loop. Loop characteristics that limit the use value of

prefetch are discussed below.

B.4.4.4. Compute vs. Data Bus Bound

At the extreme, a loop, which is data bus bound, does not benefit from prefetch because all the

system resources to transfer data are quickly allocated and there are no instructions that can

profitably be executed. On the other end of the scale, compute bound loops allow complete hiding

of all data transfer latencies.

B.4.4.5. Low Number of Iterations

Loops with very low iteration counts may have the advantages of prefetch completely mitigated. A

loop with a small fixed number of iterations may be faster if the loop is completely unrolled rather

than trying to schedule prefetch instructions.

psd floor

lookup

linexfer

pref

× N

hwlinexfer

evict

×++()

CPI N

inst

×()

----------------------------------------------------------------------------------------------------------------------------





previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

Intel Processor Computer Hardware User Manual