Compaq ECQD2KCTE Laptop User Manual


 
A–4 Alpha Architecture Handbook
quickly as possible, second priority to predicting conditional branches based on the sign of the
displacement field (backward taken, forward not-taken), and third priority to predicting sub-
routine return addresses by running a small prediction stack. (VAX traces show a stack of two
to four entries correctly predicts most branches.)
A.2.3 Improving I-Stream Density — Factor of 3
Compilers should try to use profiles to make sure almost 100% of the bytes brought into an
I-cache are actually executed. This requires aligning branch targets and putting rarely executed
code out of line.
A.2.4 Instruction Scheduling — Factor of 3
The performance of Alpha programs is sensitive to how carefully the code is scheduled to min-
imize instruction-issue delays.
"Result latency" is defined as the number of CPU cycles that must elapse between an instruc-
tion that writes a result register and one that uses that register, if execution-time stalls are to be
avoided. Thus, with a latency of zero, the instruction writes a result register and the instruction
that uses that register can be multiple-issued in the same cycle. With a latency of 2, if the writ-
ing instruction is issued at cycle N, the reading instruction can issue no earlier than cycle N+2.
Latency is implementation specific.
Most Alpha instructions have a non-zero result latency. Compilers should schedule code so
that a result is not used too soon, at least in frequently executed code (inner loops, as identified
by execution profiles). In general, this will require unrolling loops and inlining short
procedures.
Compilers should try to schedule code to match the above latency rules and also to match the
multiple-issue rules. If doing both is impractical for a particular sequence of code, the latency
rules are more important (since they apply even in single-issue implementations).
Implementors should give first priority to minimizing the latency of back-to-back integer oper-
ations, of address calculations immediately followed by load/store, of load immediately
followed by branch, and of compare immediately followed by branch. Give second priority to
minimizing latencies in general.
A.3 Data-Stream Considerations
The following sections describe considerations for the data stream.
A.3.1 Data Alignment — Factor of 10
Data PSECTs should be at least octaword aligned, so that aggregates (arrays, some records,
subroutine stack frames) can be allocated on aligned octaword boundaries to take advantage of
any implementations with aligned octaword data paths, and to decrease the number of cache
fills in almost all implementations.
Aggregates (arrays, records, common blocks, and so forth) should be allocated on at least