AMD 250 Computer Hardware User Manual


 
Appendix A Microarchitecture for AMD Athlon™ 64 and AMD Opteron™ Processors 259
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
cache and, if required, to the L2 cache or system memory. The 44-entry LSU provides a data interface
for both the integer scheduler and the floating-point scheduler. It consists of two queues—a 12-entry
queue for L1 cache load and store accesses and a 32-entry queue for L2 cache or system memory load
and store accesses. The 12-entry queue can request a maximum of two L1 cache operations (and mix
of loads and stores) per cycle. Up to two 64-bit stores can be performed per cycle. In other words,
16 bytes per clock is the maximum rate at which the processor can move data. The 32-entry queue
effectively holds requests that missed in the L1 cache probe by the 12-entry queue. Finally, the LSU
helps ensure that the architectural load and store ordering rules are preserved (a requirement for
AMD64 architecture compatibility).
Figure 9. Load-Store Unit
A.16 L2 Cache
The AMD Athlon 64 and AMD Opteron processors each contain an integrated L2 cache. This full-
speed on-die L2 cache features an exclusive cache architecture. The L2 cache contains only victim or
copy-back cache blocks that are to be written back to the memory subsystem as a result of a conflict
miss. These terms, victim or copy-back, refer to cache blocks that were previously held in the L1
cache but had to be overwritten (evicted) to make room for newer data. The victim buffer contains
data evicted from the L1 cache.
LSU
44-Entry
Data
Cache
2-Way
64 Kbytes
Operand
Buses
Result Buses
from
Core
Store Data
to BIU