Compaq 21264 Network Card User Manual


 
21264/EV68A Hardware Reference Manual
PALcode Restrictions and Guidelines D–9
Guideline 6 : Avoid Consecutive Read-Modify-Write-Read-Modify-Write
D.4 Guideline 6 : Avoid Consecutive Read-Modify-Write-Read-
Modify-Write
Avoid consecutive read-modify-write-read-modify-write sequences to IPRs in the same
scoreboard group.
The latency between the first write and the second read is determined by the retire
latency of the IPR. For convenience of implementation, the latency between the time
when the read is issued and when the final write is issued depends on the run-time con-
tents of the issue queue. It is somewhere between four and nine cycles, even if there is
no data dependency between the read and write.
D.5 Restriction 7 : Replay Trap, Interrupt Code Sequence, and STF/
ITOF
On an Mbox replay trap, the 21264/EV68A Ibox guarantees that the refetched load or
store instruction that caused the trap is issued before any newer load or store instruc-
tions. For load and integer store instructions, this is a consequence of the natural opera-
tion of the issue queue. The refetched instruction enters the age-prioritized queue ahead
of newer load and store instructions and does not have any dependencies on dirty regis-
ters.
Because there is no overhead time for checking these register dependencies (that is, it is
known upon enqueueing that there are no dirty registers), the queue will issue the
refetched instruction in priority order. For floating-point store instructions, there is nor-
mally some overhead associated with checking the floating-point source register dirty
status, so the store instruction would normally wait before being issued. This would
have the undesired consequence of allowing newer load and store instructions to be
issued out of order. A deadlock can occur if issuing the instructions out-of-order causes
the floating-point store instruction to continually replay the trap. To avoid the deadlock
on a floating-point store instruction replay trap, the source register dirty status is not
checked (the source register is assumed to be clean because the store instruction was
issued previously).
The hardware mechanism that keeps track of replayed floating-point store instructions,
and cancels the dirty register check, requires some software restrictions to guarantee
that it is applied appropriately to the replayed instruction and not to other floating-point
store instructions. The hardware mechanism marks the position in the fetch block (low
two bits of the PC) where the replay trap occurred. This action cancels the dirty float-
ing-point source register check of the next valid instruction enqueued to the integer
queue (integer, all load and store, and ITOF instructions) that has the same position in
the fetch block (normally the replayed STF). If the PC is somehow diverted to a PAL-
code flow, this hardware might inadvertently cancel the register check of some other
STF (or ITOF) instruction. Fortunately, there are a minimal number of reasons why the
PC might be diverted during a replay trap. They are interrupts and ITB fills.
The following PALcode example shows that an STF or ITOF instruction, in a given
position in a fetch block, must be preceded by a valid instruction that is issued out of
the integer queue in the same position in an earlier fetch block. Acceptable instruction
classes include load, integer store, and integer operate instructions that do not have R31
as a destination or branch.