Index 367
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Index
Numerics
3DNow! 210, 215, 217–218, 221, 224, 230, 233
A
address-generation interlocks 151
AMD Athlon™ processor
microarchitecture 250–251
AMD Athlon™ system bus 260
arrays 10
B
binary-to-ASCII decimal conversion 181
boolean operators 17
branch target buffer (BTB) 126, 253
branches
align branch targets 76
based on comparisons between floats 54
compound branch conditions 14
dependent on random data 130
optimizing density of 126
prediction 253
replace with computation in 3DNow! code 136
C
C language 14
array notation versus pointers 10
C code to 3DNow! code examples 138–140
structures 39, 117
cache
64-byte cache line 116
CALL and RETURN instructions 132
ccNUMA 96
code padding using neutral code fillers 89
code segment (CS) base, nonzero 135
const type qualifier 30
D
data cache 255
decoding 254
DirectPath
DirectPath over VectorPath instructions 72
displacements, 8-bit sign-extended 88
division 160–162, 186
replace division with multiplication, integer 43, 160
dynamic memory allocation consideration 19
E
extended-precision data 248
F
far control-transfer instructions 142
floating-point
compare instructions 244
division and square roots 50
execution unit 258
scheduler 257
to integer conversions 52
variables and expressions are type float 9
FXCH instruction 245
I
if statement 16, 33
immediates, 8-bit sign-extended 87
IMUL instruction 164
inline functions 149, 170
inline REP string with low counts 168
instruction
cache 252
control unit 254
short encodings 80
integer
arithmetic, 64-bit 170
division 43
execution unit 256
operand, consider sign 48
scheduler 256
use 32-bit data types for integer code 47
L
L2 cache controller 259
LEA instruction 77, 85
LEAVE instruction 83
load/store 22, 258
load-execute instructions 73
floating-point instructions 74
integer instructions 73
local functions 34
local variables 41, 44
LOOP instruction 141
loops
generic loop hoisting 31
minimize pointer arithmetic 154
partial loop unrolling 146