IBM PPC440X5 Computer Hardware User Manual


 
User’s Manual
Preliminary PPC440x5 CPU Core
cache.fm.
September 12, 2002
Page 129 of 589
4.3.3.6 Data Cache Parity Operations
The data cache contains parity bits and multi-hit detection hardware to protect against soft data errors. Both
the data cache tags and data are protected. Data cache lines consist of a tag field, 256 bits of data, 4 modi-
fied (dirty) bits, 4 user attribute (U) bits, and 39 parity bits. The tag field is stored in CAM (Content Addressible
Memory) cells, while the data and parity bits are stored in normal RAM cells. The data cache is physically
tagged and indexed, so the tag field contains a real address that is compared to the real address produced by
the translation hardware when a load, store , or other cache operation is executed. The exact number of
effective address bits depends on the specific cache size.
Two types of errors are detected by the data cache parity logic. In the first type, the parity bits stored in the
RAM array are checked against the appropriate data in the RAM line any time the RAM line is read. The RAM
data may be read by an indexed operation such as a reload dump (RLD), or by a CAM lookup that matches
the tag address, such as a load, dcbf, dcbi, or dcbst. If a line is to be cast out of the cache due to replace-
ment or in response to a dcbf, dcbi,ordcbst, and is determined to have a parity error of this type, no effort
is made to prevent the erroneous data from being written onto the PLB. However, the write data on the PLB
interface is accompanied by a signal indicating that the data has a parity error.
The second type of parity error that may be detected is a multi-hit, also referred to as an MHIT. This type of
error may occur when a tag address bit is corrupted, leaving two tags in the memory array that match the
same input. This type of error may be detected on any CAM lookup cycle, such as for stores, loads, dcbf,
dcbi, dcbst, dcbt, dcbtst,ordcbz instructions. Note that a parity error will not be signaled as a result of an
dcread instruction.
If a parity error is detected and the MSR[ME] is asserted, (i.e. Machine Check interrupts are enabled), the
processor vectors to the Machine Check interrupt handler. As is the case for any machine check interrupt,
after vectoring to the machine check handler, the MCSRR0 contains the value of the oldest “uncommitted”
instruction in the pipeline at the time of the exception and MCSRR1 contains the old (MSR) context. The
interrupt handler is able to query Machine Check Status Register (MCSR) to find out that it was called due to
a D-cache parity error, and is then expected to either invalidate the data cache (using dccci), or to invoke the
OS to abort the process or reset the processor, as appropriate. The handler returns to the interrupted
process using the rfmci instruction.
If the interrupt handler is executed before a parity error is allowed to corrupt the state of the machine, the
executing process is recoverable, and the interrupt handler can just invalidate the data cache and resume
the process. In order to guarantee that all parity errors are recoverable, user code must have two characteris-
tics: first, it must mark all cacheable data pages as “write-through” instead of “copy-back.” Second, the soft-
ware-settable bit (CCR0[PRE]) must be set. This bit forces all load instructions to stall in the last stage of the
load/store pipeline for one cycle, but only if needed to ensure that parity errors are recoverable. The pipeline
stall guarantees that any parity error is detected and the resulting Machine Check interrupt taken before the
load instruction completes and the target GPR is corrupted. Setting CCR0[PRE] degrades overall application
performance. However, if the state of the load/store pipeline is such that a load instruction stalls in the last
stage for some reason unrelated to parity recoverability, then CCR0[PRE] does not cause an additional cycle
stall.
Note that the Parity exception type Machine Check interrupt is asynchronous; that is, the return address in
the MCSRR0 does not necessarily point at the instruction address that detected the parity error in the data
cache. Rather, the Machine Check interrupt is taken as soon as the parity error is detected, and some
instructions in progress may get flushed and re-excuted after the interrupt, just as if the machine were
responding to an external interrupt.