2–26 Internal Architecture
21264/EV68A Hardware Reference Manual
Special Cases of Alpha Instruction Execution
If instruction 1 is dependent on the load instruction data and the load instruction hits,
instruction 1 is removed from the queue one cycle later (at the start of cycle 8). If the
load instruction misses, then instruction 1 is aborted from the Fbox pipeline and may
request service again in cycle 7.
2.7.2 Floating-Point Store Instructions
Floating-point store instructions are duplicated and loaded into both the IQ and the FQ
from the mapper. Each IQ entry contains a control bit, fpWait, that when set prevents
that entry from asserting its requests. This bit is initially set for each floating-point store
instruction that enters the IQ, unless it was the target of a replay trap. The instruction’s
FQ clone is issued when its Ra register is about to become clean, resulting in its IQ
clone’s fpWait bit being cleared and allowing the IQ clone to issue and be executed by
the Mbox. This mechanism ensures that floating-point store instructions are always
issued to the Mbox, along with the associated data, without requiring the floating-point
register dirty bits to be available within the IQ.
2.7.3 CMOV Instruction
For the 21264/EV68A, the Alpha CMOV instruction has three operands, and so pre-
sents a special case. The required operation is to move either the value in register Rb or
the value from the old physical destination register into the new destination register,
based upon the value in Ra. Since neither the mapper nor the Ebox and Fbox data paths
are otherwise required to handle three operand instructions, the CMOV instruction is
decomposed by the Ibox pipeline into two 2-operand instructions:
The Alpha architecture instruction CMOV Ra, Rb
Rc
Becomes the 21264/EV68A instructions CMOV1 Ra, oldRc
newRc1
CMOV2 newRc1, Rb
newRc2
The first instruction, CMOV1, tests the value of Ra and records the result of this test in
a 65th bit of its destination register, newRc1. It also copies the value of the old physical
destination register, oldRc, to newRc1.
The second instruction, CMOV2, then copies either the value in newRc1 or the value in
Rb into a second physical destination register, newRc2, based on the CMOV predicate
bit stored in newRc1.
In summary, the original CMOV instruction is decomposed into two dependent instruc-
tions that each use a physical register from the free list.
To further simplify this operation, the two component instructions of a CMOV instruc-
tion are driven through the mappers in successive cycles. Hence, if a fetch line contains
n CMOV instructions, it takes n+1 cycles to run that fetch line through the mappers.
For example, the following fetch line:
ADD CMOVx SUB CMOVy
Results in the following three map cycles:
ADD CMOVx1
CMOVx2SUBCMOVy1
CMOVy2