AMD 250 Computer Hardware User Manual

Open as PDF

of 384

Appendix B Implementation of Write-Combining 267

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

better throughput since bus efficiency is increased. This is because bus arbitration overhead is lower:

only one address/attribute phase is issued per burst in the PCI-X case, and one address/command

phase is issued for the AGP Fast Writes case. An illustration of address phase overhead on AGP Fast

Writes is provided in Figure 10 on page 346 in Appendix D, AGP Considerations.

For reasons cited in the precding paragraph, to utilize hardware write chaining efficiently, software

should flush the CPU write-combining buffer in sequential linear address order, any time a target

hardware device is capable of receiving large bursts of CPU write data.

Software should be aware that on AMD64 processors that have multiple write-combining buffers (i.e.

Rev. D, and E processors), events that flush the write-combining buffers (see Appendix B, Table 8.)

will send out the 64-byte WC buffers in the order that the streams were opened. This means that if the

CPU writes to the WC space in the highest 64-byte addressed buffer first (for example address 40h),

and then writes to a lower 64-byte buffer next, (for example address 00h), when those buffers are sent

by the CPU (by HyperTransport to the tunnel), the highest address 64-byte buffer will be sent first,

followed by the second (lower address) 64-byte buffer. Since the addressing is not sequential the

tunnel device will not "chain" both 64-byte WC buffers and must issue 2 separate transactions on the

target bus.

If the above example were targeted for AGP fast writes, issuing two fast write transactions (rather

than issuing one Fast Write transaction) will reduce the bandwidth (data throughput) by 1/3. See

Figure 10 on page 346 in Appendix D.

Optimizations

Adhere to the following guidelines to ensure that Revision D and E AMD Athlon 64 and AMD

Opteron processors issue WC buffers in sequential address order:

• When practical, shadow the data structure in memory (rather than writing the actual WC buffer in

MMI/O space), prior to copying the structure to WC MMI/O space. This will also ensure that the

write-combining buffers are not emptied prematurely by external events (such as a UC read—

perhaps issued by another device driver thread or a hardware interrupt, etc.). Shadowing also

ensures that writes that occur to different cache lines in the structure do not send out the WC

buffers, since the number of WC buffers that can be open at one time is CPU implementation

dependent.

• When ready to update the actual WC MMI/O address space, copy the shadowed structure from

memory to MMI/O, from the lowest address 64-byte block upward. To do the copy, use discrete

loads and stores for up to 64 bytes of data. Use a loop of discrete loads and stores for up to 4KB of

data. Up to 32KB use REP MOVS instructions. To do discrete loads use assembly language, or, if

available, compiler intrinsic functions available (__movsb(), __movsw(), __movsd()), etc.

• In general, using these methods to do the copy will exhibit less overhead in a data movement

function than calling a memcpy( ) LIBC function, which is usually optimized for copying larger

blocks of memory.

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

AMD 250 Computer Hardware User Manual