Compaq ECQD2KCTE Laptop User Manual


 
A–10 Alpha Architecture Handbook
Note:
The shifts often can be combined with shifts that might surround subsequent arithmetic
operations (for example, to produce word overflow from the high end of a register).
In the common case, the intended sequence for loading and zero-extending a byte is:
LDL R1,D.lw(Rx) !
EXTBL R1,#D.mod,R1 !
In the common case, the intended sequence for loading and sign-extending a byte is:
LDL R1,D.lw(Rx) !
SLL R1,#56-8*D.mod,R1 !
SRA R1,#56,R1 !
In the common case, the intended sequence for storing an aligned word R5 is:
LDL R1,D.lw(Rx) !
INSWL R5,#D.mod,R3 !
MSKWL R1,#D.mod,R1 !
BIS R3,R1,R1 !
STL R1,D.lw(Rx) !
In the common case, the intended sequence for storing a byte R5 is:
LDL R1,D.lw(Rx) !
INSBL R5,#D.mod,R3 !
MSKBL R1,#D.mod,R1 !
BIS R3,R1,R1 !
STL R1,D.lw(Rx) !
A.4.2 Division
In all implementations, floating-point division is likely to have a substantially longer result
latency than floating-point multiply. In addition, in many implementations multiplies will be
pipelined and divides will not.
Thus, any division by a constant power of two should be compiled as a multiply by the exact
reciprocal, if it is representable without overflow or underflow. If language rules or surround-
ing context allow, multiplication by the reciprocal can closely approximate other divisions by
constants.
Integer division does not exist as a hardware opcode. Division by a constant can always be
done via UMULH of another appropriate constant, followed by a right shift. A subroutine can
do general quadword division by true variables. The subroutine could test for small divisors
(less than about 1000 in absolute value) and for those, do a table lookup on the exact constant
and shift count for an UMULH/shift sequence. For the remaining cases, a table lookup on
about a 1000-entry table and a multiply can give a linear approximation to 1/divisor that is
accurate to 16 bits.
Using this approximation, a multiply and a back-multiply and a subtract can generate one