IA-32 Intel® Architecture Optimization
2-10
General Compiler Recommendations
A compiler that has been extensively tuned for the target microarchitec-
ture can be expected to match or outperform hand-coding in a general
case. However, if particular performance problems are noted with the
compiled code, some compilers (like the Intel C++ and Fortran Compil-
ers) allow the coder to insert intrinsics or inline assembly in order to
exert greater control over what code is generated. If inline assembly is
used, the user should verify that the code generated to integrate the
inline assembly is of good quality and yields good overall performance.
Default compiler switches are targeted for the common case. An
optimization may be made to the compiler default if it is beneficial for
most programs. If a performance problem is root-caused to a poor
choice on the part of the compiler, using different switches or compiling
the targeted module with a different compiler may be the solution.
VTune™ Performance Analyzer
Where performance is a critical concern, use performance monitoring
hardware and software tools to tune your application and its interaction
with the hardware. IA-32 processors have counters which can be used to
monitor a large number of performance-related events for each
microarchitecture. The counters also provide information that helps
resolve the coding pitfalls.
The VTune Performance Analyzer allow engineers to use these counters
to provide with two kinds of tuning feedback:
• indication of a performance improvement gained by using a specific
coding recommendation or microarchitectural feature,
• information on whether a change in the program has improved or
degraded performance with respect to a particular metric.