AMD 250 Computer Hardware User Manual


 
Appendix D AGP Considerations 347
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
On the AMD Athlon™ 64 and AMD Opteron™ processors, write-combining can be used, and
software can take advantage of the fact that writes are sent out of the processor's write buffers in
ascending order (and appear on HyperTransport that way), from low quadword to high quadword.
Use the Memory Type Range Register (MTRR) mechanism in conjunction with the PAT MSR
(model-specific register 277h) to enable write-combining as the memory type for the FIFO address
space.
To enable write-combining as the memory type for the FIFO address space, follow these steps:
1. Change the PAT MSR entries that contain a type value of 00h (UC-uncacheable) to a type value of
07h (UC-minus).
2. Program an MTRR with the physical addres and mask range of the command FIFO.
Note: MTRR registers mark addresses on page granularity boundaries of 4 Kbytes, so the FIFO
address should begin on a 4-Kbyte-aligned address boundary).
For more information, see Chapter 7, “Memory System,” in volume 2 of the AMD64 Architecture
Programmer’s Manual, order# 24593.
Many graphics engines have a front-end command FIFO that requires the render command to be
issued first, followed by a variable number of doublewords, depending on the render command.
Create a cache-aligned command structure in cacheable memory, map the rendering command into
the lowest doubleword of the structure (which will be issued first), map the next data required in the
command into the next structure element, and so on, until all the data “registers” for this command are
included in the structure. An example is given in Figure 11.
Figure 11. Cacheable-Memory Command Structure
When the command (or commands) are filled in the shadowed structure, use a high-speed copy
routine like the one shown in Listing 31 on page 348. Copy the structure to the actual graphic
accelerator’s write-combining FIFO address space. Locating the write-combining command FIFO at
a cache-aligned address is slightly better, since one HyperTransport link-size write occurs instead of
two).
Doubleword 0 (0h)
Doubleword 1 (4h)
.
.
.
Doubleword 2 (8h)
Doubleword 16 (3Fh)
Render command 1
Parameter 1
Parameter 2
Top of cache line