AMD 250 Computer Hardware User Manual

Open as PDF

of 384

Chapter 5 Cache and Memory Optimizations 105

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

instructions can improve performance. Prefetch instructions only update the L1 data cache and do not

update an architectural register. This uses one less register compared to a load instruction.

Unit-Stride Access

Large data sets typically require unit-stride access to ensure that all data pulled in by a prefetch

instruction is actually used. Large data sets make use of all data that is read from memory, rather than

using only a sparse subset of the memory. If necessary, you should reorganize algorithms or data

structures to allow unit-stride access. For a definition of unit-stride access, see “Definitions” on

page 110.

Hardware Prefetching

The AMD Athlon 64 and AMD Opteron processors implement a hardware prefetching mechanism.

The prefetched data is loaded into the L2 cache. The hardware prefetcher works most efficiently when

data is accessed on a cache-line-by-cache-line basis (that is, without skipping cache lines). Cache

lines on current AMD Athlon 64 and AMD Opteron processors are 64 bytes, but cache-line size is

implementation dependent.

The hardware prefetcher prefetches data that is accessed in an ascending or descending order on a

cache-line-by-cache-line basis. For example, when the hardware prefetcher detects an access to cache

line l followed by an access to cache line l + 1, it initiates a prefetch of cache line l + 3. Accessing

data in increments larger than 64 bytes may fail to trigger the hardware prefetcher because cache lines

are skipped. In these cases, software-prefetch instructions should be employed. Note that in some

earlier revisions of the AMD Athlon 64 and AMD Opteron processors the hardware prefetcher would

only detect ascending accesses.

In some cases, using prefetch instructions on processors with hardware prefetching may slightly

reduce performance. In these cases, it may be necessary to remove the prefetch instructions. All

current AMD Athlon 64 and AMD Opteron processors have hardware prefetching mechanisms.

PREFETCH/W versus PREFETCHNTA/T0/T1/T2

PREFETCHNTA, PREFETCHT0, PREFETCHT1, and PREFETCHT2 are SSE instructions and are

processor-implementation dependent. For the AMD Athlon 64 and AMD Opteron processors, data

that is prefetched with the PREFETCHNTA instruction is not placed into the L2 cache when it is

evicted unless it was originally in L2 when prefetched.

PREFETCHNTA is intended for non-temporal data that will not be needed again soon.

PREFETCHNTA should also be used when reading arrays that are so large that they are larger than

the L2 cache. Because of their size, such large arrays will not be available in L2 even if they are

needed again, and by feeding them through the L2 cache, other possibly useful data will also be

evicted from L2.

Note: The L2 cache size of the processor can be determined by using the CPUID instruction.

Chapters 5 and 9 show examples of how to use the PREFETCHNTA instruction.

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

AMD 250 Computer Hardware User Manual