Support User Manuals

Compaq EV68A Network Card User Manual

Open as PDF

of 356

2–26 Internal Architecture

21264/EV68A Hardware Reference Manual

Special Cases of Alpha Instruction Execution

If instruction 1 is dependent on the load instruction data and the load instruction hits,

instruction 1 is removed from the queue one cycle later (at the start of cycle 8). If the

load instruction misses, then instruction 1 is aborted from the Fbox pipeline and may

request service again in cycle 7.

2.7.2 Floating-Point Store Instructions

Floating-point store instructions are duplicated and loaded into both the IQ and the FQ

from the mapper. Each IQ entry contains a control bit, fpWait, that when set prevents

that entry from asserting its requests. This bit is initially set for each floating-point store

instruction that enters the IQ, unless it was the target of a replay trap. The instruction’s

FQ clone is issued when its Ra register is about to become clean, resulting in its IQ

clone’s fpWait bit being cleared and allowing the IQ clone to issue and be executed by

the Mbox. This mechanism ensures that floating-point store instructions are always

issued to the Mbox, along with the associated data, without requiring the floating-point

register dirty bits to be available within the IQ.

2.7.3 CMOV Instruction

For the 21264/EV68A, the Alpha CMOV instruction has three operands, and so pre-

sents a special case. The required operation is to move either the value in register Rb or

the value from the old physical destination register into the new destination register,

based upon the value in Ra. Since neither the mapper nor the Ebox and Fbox data paths

are otherwise required to handle three operand instructions, the CMOV instruction is

decomposed by the Ibox pipeline into two 2-operand instructions:

The Alpha architecture instruction CMOV Ra, Rb

Rc

Becomes the 21264/EV68A instructions CMOV1 Ra, oldRc

newRc1

CMOV2 newRc1, Rb

newRc2

The first instruction, CMOV1, tests the value of Ra and records the result of this test in

a 65th bit of its destination register, newRc1. It also copies the value of the old physical

destination register, oldRc, to newRc1.

The second instruction, CMOV2, then copies either the value in newRc1 or the value in

Rb into a second physical destination register, newRc2, based on the CMOV predicate

bit stored in newRc1.

In summary, the original CMOV instruction is decomposed into two dependent instruc-

tions that each use a physical register from the free list.

To further simplify this operation, the two component instructions of a CMOV instruc-

tion are driven through the mappers in successive cycles. Hence, if a fetch line contains

n CMOV instructions, it takes n+1 cycles to run that fetch line through the mappers.

For example, the following fetch line:

ADD CMOVx SUB CMOVy

Results in the following three map cycles:

ADD CMOVx1

CMOVx2SUBCMOVy1

CMOVy2

previous next