Compaq ECQD2KCTE Laptop User Manual


 
5–8 Alpha Architecture Handbook
Both conditional branches are forward branches, so they are properly predicted not to
be taken (to match the common case of no contention for the lock).
The OR writes its result to a second register; this allows the OR and the BLBS to be
interchanged if that would give a faster instruction schedule.
Other operate instructions (from the critical section) may be scheduled into the
LDQ_L..STQ_C sequence, so long as they do not fault or trap and they give correct
results if repeated; other memory or operate instructions may be scheduled between the
STQ_C and BEQ.
The memory barrier instructions are discussed in Section 5.5.4. It is correct to substitute
WMB for the second MB only if:
All data locations that are read or written in the critical section are accessed only
after acquiring a software lock by using lock_variable (and before releasing the
software lock).
For each read u of shared data in the critical section, there is a write v such that:
1. v is BEFORE the WMB
2. v follows u in processor issue sequence (see Section 5.6.1.1)
3. v either depends on u (see Section 5.6.1.7) or overlaps u (see Section 5.6.1), or
both.
Both lock_variable and all the shared data are in memory-like regions (or
lock_variable and all the shared data are in non-memory-like regions). If the
lock_variable is in a non-memory-like region, the atomic lock protocol must use
some implementation-specific hardware support.
Generally, the substitution of a WMB for the second MB increases performance.
An ordinary STQ instruction is used to clear the lock_variable.
It would be a performance mistake to spin-wait by repeating the full LDQ_L..STQ_C sequence
(to move the BLBS after the BEQ) because that sequence may repeatedly change the software
lock_variable from "locked" to "locked," with each write causing extra access delays in all
other caches that contain the lock_variable. In the extreme, spin-waits that contain writes may
deadlock as follows:
If, when one processor spins with writes, another processor is modifying (not changing)
the lock_variable, then the writes on the first processor may cause the STx_C of the
modify on the second processor always to fail.
This deadlock situation is avoided by:
Having only one processor execute a store (no STx_C), or
Having no write in the spin loop, or
Doing a write only if the shared variable actually changes state (1 1 does not change
state).