Using Performance Monitoring Events B
B-57
There are three cycle-counting events which will not progress on a
halted core, even if the halted core is being snooped. These are:
Unhalted core cycles, Unhalted reference cycles, and Unhalted bus
cycles. All three events are detected for the unit selected by event 3CH.
Some events detect microarchitectural conditions but are limited in their
ability to identify the originating core or physical processor. For
example, bus_drdy_clocks may be programmed with a unit mask of
20H to include all agents on a bus. In this case, the performance counter
in each core will report nearly identical values. Performance tools
interpreting counts must take into account that it is only necessary to
equate bus activity with the event count from one core (and not use not
the sum from each core).
The above is also applicable when the core-specificity sub field (bits
15:14 of IA32_PERFEVTSELx MSR) within an event mask is
programmed with 11B. The result of reported by performance counter
on each core will be nearly identical.
Ratio Interpretation
Ratios of two events are useful for analyzing various characteristics of a
workload. It may be possible to acquire such ratios at multiple
granularities: (1) per-application thread, (2) per logical processor, (3)
per core, and (4) per physical processor.
The first is most useful from a software development perspective, but
requires multi-threaded applications to manage processor affinity
explicitly for each application thread. The other options provide insights
on hardware utilization.
In general, collect measurements (for all events in a ratio) in the same
run. This should be done because:
• If measuring ratios for a multi-threaded workload, getting results for
all events in the same run enables you to understand which event
counter values belongs to each thread.