Support User Manuals

AMD 250 Computer Hardware User Manual

Open as PDF

of 384

126 Branch Optimizations Chapter 6

25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

6.1 Density of Branches

Optimization

When possible, align branches such that they do not cross a 16-byte boundary.

Application

This optimization applies to:

• 32-bit software

• 64-bit software

Rationale

The AMD Athlon™ 64 and AMD Opteron™ processors have the capability to cache branch-

prediction history for a maximum of three near branches (CALL, JMP, conditional branches, or

returns) per 16-byte fetch window. A branch instruction that crosses a 16-byte boundary is counted in

the second 16-byte window. Due to architectural restrictions, a branch that is split across a 16-byte

boundary cannot dispatch with any other instructions when it is predicted taken. Perform this

alignment by rearranging code; it is not beneficial to align branches using padding sequences.

The following branches are limited to three per 16-byte window:

j

cc

rel8

j

cc

rel32

jmp

rel8

jmp

rel32

jmp

reg

jmp WORD PTR

jmp DWORD PTR

call

rel16

call

r/m16

call

rel32

call

r/m32

Coding more than three branches in the same 16-byte code window may lead to conflicts in the

branch target buffer. To avoid conflicts in the branch target buffer, space out branches such that three

previous next