136 Branch Optimizations Chapter 6
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
6.7 Replacing Branches with Computation
Optimization
Use computation to simulate predicted execution or conditional moves.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
Branches can negatively impact the performance of code. If the body of the branch is small, you can
achieve higher performance by replacing the branch with computation. The computation simulates
predicated execution or conditional moves. There are many SSE and SSE2 instructions that can be
useful for accomplishing this. The principal instructions are as follows: ANDPS, ANDPD, ANDNPS,
ANDNPD, CMPPS, CMPSS, CMPPD, CMPSD, MINPS, MINSS, MINPD, MINSD, MAXPS,
MAXSS, MAXPD, MAXSD, ORPS, ORPD, PAND, PANDN, PCMPEQB, PCMPEQD,
PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMAXSW, PMAXUB, PMINSW, PMINUB,
POR, PXOR, XORPS, and XORPD.
For 32-bit code using 3DNow!™ instructions, try to avoid moving the MMX™ data to integer
registers to perform comparisons and branches. Moving MMX data to integer registers requires either
transport through memory or the use of MOVD reg, mmreg instructions, which are relatively
inefficient. When using 3DNow! technology and MMX registers, the following instructions may be
useful for eliminating branches: PCMPGTB, PCMPGTD, PCMPGTW, PFCMPGT, PFCMPGE,
PFMIN, PFMAX, PAND, PANDN, POR, and PXOR.
Muxing Constructs
The most important construct to use in avoiding branches in SIMD code is a two-way muxing
construct that is equivalent to the ternary operator (?:) in C and C++.