Using Performance Monitoring Events B
B-3
Replay
In order to maximize performance for the common case, the Intel
NetBurst microarchitecture sometimes aggressively schedules
μops for
execution before all the conditions for correct execution are guaranteed
to be satisfied. In the event that all of these conditions are not satisfied,
μops must be reissued. This mechanism is called replay.
Some occurrences of replays are caused by cache misses, dependence
violations (for example, store forwarding problems), and unforeseen
resource constraints. In normal operation, some number of replays are
common and unavoidable. An excessive number of replays indicate that
there is a performance problem.
Assist
When the hardware needs the assistance of microcode to deal with some
event, the machine takes an assist. One example of such situation is an
underflow condition in the input operands of a floating-point operation.
The hardware must internally modify the format of the operands in
order to perform the computation. Assists clear the entire machine of
μops before they begin to accumulate, and are costly. The assist
mechanism on the Pentium 4 processor is similar in principle to that on
the Pentium II processors, which also have an assist event.
Tagging
Tagging is a means of marking μops to be counted at retirement. See
Appendix A of the IA-32 Intel® Architecture Software Developer’s
Manual, Volume 3B, for the description of the tagging mechanisms. The
same event can happen more than once per
μop. The tagging
mechanisms allow a
μop to be tagged once during its lifetime. The
retired suffix is used for metrics that increment a count once per
μop,
rather than once per event. For example, a
μop may encounter a cache