Intel IA-32 Computer Accessories User Manual


 
Coding for SIMD Architectures 3
3-11
specific optimizations. Where appropriate, the coach displays
pseudo-code to suggest the use of highly optimized intrinsics and
functions in the Intel
®
Performance Library Suite. Because VTune
analyzer is designed specifically for all of the Intel architecture
(IA)-based processors, including the Pentium 4 processor, it can offer
these detailed approaches to working with IA. See “Code Optimization
Options” in Appendix A for more details and example of a code coach
advice.
Determine If Code Benefits by Conversion to SIMD Execution
Identifying code that benefits by using SIMD technologies can be
time-consuming and difficult. Likely candidates for conversion are
applications that are highly computation intensive, such as the
following:
speech compression algorithms and filters
speech recognition algorithms
video display and capture routines
rendering routines
3D graphics (geometry)
image and video processing algorithms
spatial (3D) audio
physical modeling (graphics, CAD)
workstation applications
encryption algorithms
complex arithmetics
Generally, good candidate code is code that contains small-sized
repetitive loops that operate on sequential arrays of integers of 8, 16 or
32 bits, single-precision 32-bit floating-point data, double precision
64-bit floating-point data (integer and floating-point data items should
be sequential in memory). The repetitiveness of these loops incurs