User’s Manual
Preliminary PPC440x5 CPU Core
overview.fm.
September 12, 2002
Page 31 of 589
1.3.2 Execution Pipelines
The PPC440x5 core contains three execution pipelines: complex integer, simple integer, and load/store.
Each pipeline consists of four stages and can access the nine-ported (six read, three write) GPR file. In order
to improve performance and avoid contention for the GPR file, there are two identical copies of it. One is dedi-
cated to the complex integer pipeline, while the other is shared by the simple integer and the load/store pipe-
lines.
The complex integer pipeline handles all arithmetic, logical, branch, and system management instructions
(such as interrupt and TLB management, move to/from system registers, and so on). This pipeline also
handles multiply and divide operations, and 24 DSP instructions that perform a variety of multiply-accumulate
operations. The complex integer pipeline multiply unit can perform 32-bit × 32-bit multiply operations with
single-cycle throughput and three-cycle latency;16-bit × 32-bit multiply operations have only two-cycle
latency. Divide operations take 33 cycles.
The simple integer pipeline can handle most arithmetic and logical operations which do not update the Condi-
tion Register (CR).
The load/store pipeline handles all load, store, and cache management instructions. All misaligned opera-
tions are handled in hardware, with no penalty on any operation which is contained within an aligned 16-byte
region. The load/store pipeline supports all operations to both big endian and little endian data regions.
Appendix B, “PPC440x5 Core Compiler Optimizations,” provides detailed information on instruction timings
and performance implications in the PPC440x5 core.
1.3.3 Instruction and Data Cache Controllers
The PPC440x5 core provides separate instruction and data cache controllers and arrays, which allow concur-
rent access and minimize pipeline stalls. The storage capacity of the cache arrays, which can range from
8KB–32KB each, depends upon the implementation. Both cache controllers have 32-byte lines, and both are
highly-associative, with 64-way set-associativity for 32KB and 16KB sizes, and 32-way set-associativity for
the 8KB size. Both caches support parity checking on the tags and data in the memory arrays, to protect
against soft errors. If a parity error is detected, the CPU will cause a machine check exception.
The PowerPC instruction set provides a rich set of cache management instructions for software-enforced
coherency. The PPC440x5 implementation also provides special debug instructions that can directly read the
tag and data arrays. See Chapter 4, “Instruction and Data Caches,” for detailed information about the instruc-
tion and data cache controllers.
The cache controllers connect to the PLB for connection to the IBM CoreConnect system-on-a-chip environ-
ment.
1.3.3.1 Instruction Cache Controller (ICC)
The ICC delivers two instructions per cycle to the instruction unit of the PPC440x5 core. The ICC also
handles the execution of the PowerPC instruction cache management instructions for coherency. The ICC
includes a speculative pre-fetch mechanism which can be configured to automatically pre-fetch a burst of up
to three additional lines upon any fetch request which misses in the instruction cache. These speculative pre-
fetches can be abandoned if the instruction execution branches away from the original instruction stream.