Intel IA-32 Computer Accessories User Manual


 
Application Performance Tools A
A-13
stride inefficiency is most prominent on memory traffic. A useful
indicator for large-stride inefficiency in a workload is to compare the
ratio between bus read transactions and the number of DTLB pagewalks
due to read traffic, under the condition of disabling the hardware
prefetch while measuring bus traffic of the workload. The former can be
measured using the event “Bus Reads from the processor.” The latter
can be approximated by measuring the event “Page Walk DTLB All
Misses.” The latter is an approximation because the event measures
DTLB misses due to either read or write traffic, and does not distinguish
between cache traffic versus memory traffic.
Call Graph
Call graph helps you understand the relationships between the functions
in your application by providing timing and caller / callee (functions
called) information. Call graph works by instrumenting the functions in
your application. Instrumentation is the process of modifying a function
so that performance data can be captured when the function is executed.
Instrumentation does not change the functionality of the program.
However, it can reduce performance. The VTune analyzer can detect
modules as they are loaded by the operating system, and instrument
them at run-time. Call graph can be used to profile Win32*, Java*, and
Microsoft.NET* applications. Call graph only works for application
(ring 3) software.
Call graph profiling provides the following information on the functions
called by your application: total time, self-time, total wait time, wait
time, callers, callees, and the number of calls. This data is displayed
using three different views: function summary, call graph, and call list.
These views are all synchronized.
The Function Summary View can be used to focus the data displayed in
the call graph and call list views. This view displays all the information
about the functions called by your application in a sortable table format.
However, it does not provide callee and caller information. It just
provides timing information and number of times a function is called.