128 Branch Optimizations Chapter 6
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
6.2 Two-Byte Near-Return RET Instruction
Optimization
Use of a two-byte near-return can improve performance. The single-byte near-return (opcode C3h) of
the RET instruction should be used carefully. Specifically, avoid the following two situations:
• Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET
instruction as its target. See “Examples.”
• A conditional branch that occurs in the code directly before the single-byte near-return RET
instruction. See “Examples.”
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
The processor is unable to apply a branch prediction to the single-byte near-return form (opcode C3h)
of the RET instruction.
The easiest way to assure the utilization of the branch prediction mechanism is to use a two-byte RET
instruction. A two-byte RET has a REP instruction inserted before the RET, which produces the
functional equivalent of the single-byte near-return RET instruction, but is not affected by the
prediction limitations outlined above. To use a two-byte RET, define a text macro named REPRET and
use it instead of the RET instruction to force the intended object code.
REPRET TEXTEQU <DB 0F3h, 0C3h>
Examples
Avoid branches in which the target of the branch is a single-byte near-return:
jmp label ; Jump to a single-byte near-return RET instruction.
...
label:
ret ; RET is potentially mispredicted.
Avoid branches that immediately precede a single-byte near-return:
jz label ; Conditional branch is not taken.
ret ; RET is a fall-through instruction,
; potentially mispredicted.