Appendix D AGP Considerations 353
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
frequencies increase, so will the ratio of operating frequencies between processor caches and DDR
memory. The processor-to-write-back cache bandwidth is also higher than processor-to-AGP-aperture
bandwidth (write-combining memory type), since the DDR writes are avoided (as well as GART
translation latencies).
It may be possible to prevent pollution of the L1-data and L2 caches from DMA data by using the
nontemporal PREFETCHNTA instruction on the DMA buffer and limiting prefetching of the DMA
buffer to less than 32 Kbytes (PREFETCHNTA uses only one way of the L1 data cache).
Use PREFETCHNTA on the linear address to the DMA buffer, and not the AGP aperture address,
before reading or writing the DMA buffer.
Another key optimization for the DMA model on AMD Athlon 64 and AMD Opteron systems is that
coherency is maintained between processor caches and an AGP master making accesses outside of
the AGP aperture.
This is a key AGP enhancement that is required of AGP 3.0 target (host platform) systems.
In effect, this means that an AGP master can create a DMA buffer in normal write-back memory and
then pass the physical DRAM page address to the AGP master; in other words, the AGP virtual
address and GART translation is not used.
Use PREFETCHNTA on the linear address to the DMA buffer, before reading or writing the DMA
buffer.
If the AGP card hardware is capable of buffering the physical DRAM page addresses sent to the AGP
card in a FIFO, then in effect the AGP card’s device driver is getting AGP scatter-gather capabilities,
with cache coherency provided by the processor.
D.6 Optimizations for Texture-Map Copies to AGP
Memory
To avoid cache pollution, use the same technique described in “Fast-Write Optimizations for Video-
Memory Copies” on page 349 to copy texture data into AGP memory, since this data tends to be
nontemporal.
D.7 Optimizations for Vertex-Geometry Copies to AGP
Memory
To avoid cache pollution, use the same technique described in “Fast-Write Optimizations for Video-
Memory Copies” on page 349 to copy vertex data into AGP memory, since this data tends to be
nontemporal.