IA-32 Intel® Architecture Optimization
2-4
Use Available Performance Tools
• Current-generation compiler, such as the Intel C++ Compiler:
— Set this compiler to produce code for the target processor
implementation
— Use the compiler switches for optimization and/or
profile-guided optimization. These features are summarized in
the “Intel® C++ Compiler” section. For more detail, see the
Intel® C++ Compiler User’s Guide.
• Current-generation performance monitoring tools, such as VTune™
Performance Analyzer:
— Identify performance issues, use event-based sampling, code
coach and other analysis resource.
— Measure workload characteristics such as instruction
throughput, data traffic locality, memory traffic characteristics,
etc.
— Characterize the performance gain.
Optimize Performance Across Processor Generations
• Use a cpuid dispatch strategy to deliver optimum performance for
all processor generations.
• Use deterministic cache parameter leaf of cpuid to deliver scalable
performance that are transparent across processor families with
different cache sizes.
• Use compatible code strategy to deliver optimum performance for
the current generation of IA-32 processor family and future IA-32
processors.
• Use a low-overhead threading strategy so that a multi-threaded
application delivers optimal multi-processor scaling performance
when executing on processors that have hardware multi-threading
support, or deliver nearly identical single-processor scaling when
executing on a processor without hardware multi-threading support.