Support User Manuals

AMD 250 Computer Hardware User Manual

Open as PDF

of 384

Appendix D AGP Considerations 353

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

frequencies increase, so will the ratio of operating frequencies between processor caches and DDR

memory. The processor-to-write-back cache bandwidth is also higher than processor-to-AGP-aperture

bandwidth (write-combining memory type), since the DDR writes are avoided (as well as GART

translation latencies).

It may be possible to prevent pollution of the L1-data and L2 caches from DMA data by using the

nontemporal PREFETCHNTA instruction on the DMA buffer and limiting prefetching of the DMA

buffer to less than 32 Kbytes (PREFETCHNTA uses only one way of the L1 data cache).

Use PREFETCHNTA on the linear address to the DMA buffer, and not the AGP aperture address,

before reading or writing the DMA buffer.

Another key optimization for the DMA model on AMD Athlon 64 and AMD Opteron systems is that

coherency is maintained between processor caches and an AGP master making accesses outside of

the AGP aperture.

This is a key AGP enhancement that is required of AGP 3.0 target (host platform) systems.

In effect, this means that an AGP master can create a DMA buffer in normal write-back memory and

then pass the physical DRAM page address to the AGP master; in other words, the AGP virtual

address and GART translation is not used.

Use PREFETCHNTA on the linear address to the DMA buffer, before reading or writing the DMA

buffer.

If the AGP card hardware is capable of buffering the physical DRAM page addresses sent to the AGP

card in a FIFO, then in effect the AGP card’s device driver is getting AGP scatter-gather capabilities,

with cache coherency provided by the processor.

D.6 Optimizations for Texture-Map Copies to AGP

Memory

To avoid cache pollution, use the same technique described in “Fast-Write Optimizations for Video-

Memory Copies” on page 349 to copy texture data into AGP memory, since this data tends to be

nontemporal.

D.7 Optimizations for Vertex-Geometry Copies to AGP

Memory

To avoid cache pollution, use the same technique described in “Fast-Write Optimizations for Video-

Memory Copies” on page 349 to copy vertex data into AGP memory, since this data tends to be

nontemporal.

previous next