Chapter 6 Branch Optimizations 137
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Examples
SSE Solution (Preferred)
; r = (x < y) ? a : b
;
; In: XMM0 = a
; XMM1 = b
; XMM2 = x
; XMM3 = y
; Out: XMM0 = r
cmpps xmm2, xmm3, 1 ; x < y ? 0xffffffff : 0
andps xmm0, xmm2 ; x < y ? a : 0
andnps xmm2, xmm1 ; x < y ? 0 : b
orps xmm0, xmm2 ; x < y ? a : b
MMX™ Solution (Avoid)
; r = (x < y) ? a : b
;
; In: MM0 = a
; MM1 = b
; MM2 = x
; MM3 = y
; Out: MM0 = r
pcmpgtd mm3, mm2 ; y > x ? 0xffffffff : 0
movq mm4, mm3 ; Duplicate mask
pandn mm3, mm1 ; y > x ? 0 : b
pand mm0, mm4 ; y > x ? a : 0
por mm0, mm3 ; r = y > x ? a : b
Because the use of PANDN destroys the mask created by PCMPGTD, the mask needs to be saved,
which requires an additional register. This adds an instruction, lengthens the dependency chain, and
increases register pressure. Therefore, write two-way muxing constructs as follows:
MMX™ Solution (Preferred)
; r = (x < y) ? a : b
;
; In: MM0 = a
; MM1 = b
; MM2 = x
; MM3 = y
; Out: MM0 = r
pcmpgtd mm3, mm2 ; y > x ? 0xffffffff : 0
pand mm0, mm3 ; y > x ? a: 0
pandn mm3, mm1 ; y > x > 0 : b
por mm0, mm3 ; r = y > x ? a : b