AMD 250 Computer Hardware User Manual


 
266 Implementation of Write-Combining Appendix B
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
B.4 Sending Write-Buffer Data to the System
The maximum write combined throughput is achieved when all quadwords or doublewords are valid
and the AMD Athlon 64 and AMD Opteron processors can use one efficient 64-byte memory write
instead of multiple 8-byte memory writes.
B.5 Write-Combining Optimization on
Revision D and E AMD Athlon™ 64 and
AMD Opteron™ Processors
The number of Write Combining buffers on revision D and revision E AMD Athlon 64 and AMD
Opteron processors has changed from earlier CPU revisions. Although the number of buffers
available for write combining depends on the specific CPU revision, current designs provide as many
as four write buffers for WC memory mapped I/O address spaces. These same buffers are used for
streaming store instructions. The number of write-buffers determines how many independent linear
64-byte streams of WC data the CPU can simultaneously buffer.
Having multiple write-combining buffers that can combine independent WC streams has implications
on data throughput rates (bandwidth), especially when data is written by the CPU to WC memory
mapped I/O devices, residing on the AGP, PCI, PCI-X and PCI-E busses including:
Memory Mapped I/O registers—command FIFO, etc.
Memory Mapped I/O apertures—windows to which the CPU use programmed I/O to send data to
a hardware device
Sequential block of 2D/3D graphic engine registers written using programmed I/O
Video memory residing on the graphics accelerator—frame buffer, render buffers, textures, etc.
HyperTransport tunnels are HyperTransport-to-bus bridges. There are tunnels for AGP, PCI Express,
PCI and PCI-X. Examples of tunnels are the AMD-8151™ graphics tunnel, the AMD-8131™ I/O
bus tunnel, and the AMD-8132™ PCI-X tunnel. Many HyperTransport tunnels use a hardware
optimization feature called write-chaining. In write-chaining, the tunnel device buffers and combines
separate HyperTransport packets of data sent by the CPU, creating one large burst on the underlying
bus when the data is received by the tunnel in sequential address order. Using larger bursts results in
WT Nonsequential If a subsequent WT write is not in ascending sequential order, the
write-combining completes. WC writes have no addressing
constraints within the 64-byte line being combined.
TLB AD bit set Write-combining is closed whenever a TLB reload sets the accessed
[A] or dirty [D] bits of a Pde or Pte.
Table 12. Write-Combining Completion Events (Continued)
Event Comment