Coding for SIMD Architectures 3
3-11
specific optimizations. Where appropriate, the coach displays
pseudo-code to suggest the use of highly optimized intrinsics and
functions in the Intel
®
Performance Library Suite. Because VTune
analyzer is designed specifically for all of the Intel architecture
(IA)-based processors, including the Pentium 4 processor, it can offer
these detailed approaches to working with IA. See “Code Optimization
Options” in Appendix A for more details and example of a code coach
advice.
Determine If Code Benefits by Conversion to SIMD Execution
Identifying code that benefits by using SIMD technologies can be
time-consuming and difficult. Likely candidates for conversion are
applications that are highly computation intensive, such as the
following:
• speech compression algorithms and filters
• speech recognition algorithms
• video display and capture routines
• rendering routines
• 3D graphics (geometry)
• image and video processing algorithms
• spatial (3D) audio
• physical modeling (graphics, CAD)
• workstation applications
• encryption algorithms
• complex arithmetics
Generally, good candidate code is code that contains small-sized
repetitive loops that operate on sequential arrays of integers of 8, 16 or
32 bits, single-precision 32-bit floating-point data, double precision
64-bit floating-point data (integer and floating-point data items should
be sequential in memory). The repetitiveness of these loops incurs