AMD 250 Computer Hardware User Manual

Open as PDF

of 384

346 AGP Considerations Appendix D

25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

The theoretical data bandwidths for fast writes at 2x, 4x, and 8x are approximately 528 Mbytes/s,

1.056 Gbytes/s, and 2.1 Gbytes/s, respectively. These numbers are theoretical in terms of sustained

bursts occurring on the AGP bus. In actuality, data bandwidth depends on the size of the data block

transferred from the processor—larger block transfers are better.

Real bandwidth will be lower than the theoretical bandwidth because the beginning of fast-write

transactions require sending a PCI-protocol start transaction cycle (for the address phase) at the 1x

transfer rate instead of the higher speeds (2x, 4x, or 8x).

Larger block transfers help hide the transaction-start overhead (smaller block transfers have lower

bandwidth). For example, at the 8x data-transfer rate, 128 bytes of data can be transferred in four

AGP clock cycles, but one initial clock cycle is required for the address phase. Five clock cycles are

required to transfer 128 bytes of data; therefore, the overhead of the address phase (clock cycle 1) for

128 bytes of data transferred is 20% (yielding a bandwidth of approximately 1.7 Gbytes/s). See

Figure 10.

Figure 10. AGP 8x Fast-Write Transaction

The overhead of the address phase for 64 bytes of data is 33% (yielding a bandwidth of approximately

1400 Mbytes/s). For 32 bytes of data (or less), the bandwidth drops to approximately 1000 Mbytes/s.

A key software optimization is to buffer as much processor write data as practical.

D.2 Fast-Write Optimizations for Graphics-Engine

Programming

Write-combining provides excellent AGP fast-write bandwidth when using the programmed I/O

(PIO) model—not the DMA model—for programming 2-D and 3-D graphics engines. To help ensure

that data is sent in optimal block sizes, you can “shadow” the engine’s render commands (that is, the

registers needed for a render command) in cache-block-aligned data structures in system memory.

Shadowing the structure in system memory (instead of writing the actual write-combining buffer in

memory-mapped I/O space) ensures that the write buffer is not emptied prematurely by external

events (such as an uncacheable read or hardware interrupt). Shadowing also ensures that writes to

different cache lines in the structure do not flush (close) the write-combining buffer since the number

of write-combining buffers that can be open at one time is processor-implementation dependent.

CLK

C/BE

23456789

CMD

ADD

First block (128 bytes)

Second block (64 bytes)

ADD

CMD

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

AMD 250 Computer Hardware User Manual