368 Index
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
REP string with low variable counts 168
unroll small loops 13
unrolling loops 145
M
memory
dynamic memory allocation 19
pushing memory data 157
MMX™ instructions
PANDN instruction 137
PREFETCHNTA/T0/T1/T2 instructions 105
MOVZX and MOVSX instructions 153
multiplication
by constant 164
multiplies over division, floating-point 238
muxing constructs 136
N
Nonuniform Memory Access 96
O
operands
largest possible operand size, repeated string 168
P
parallelism 35
PF2ID instructions 52
pointers
dereferenced arguments 44
use array-style code instead 10
population-count function 179
prefetch
determining distance 108
multiple 107
PREFETCH and PREFETCHW instructions 104, 106, 108
prototypes 29
R
recursive functions 132
register reads and writes, partial 81
REP prefix 168
S
scalar code translated into 3DNow! code 138
scheduling 144
SHLD instruction 85
SHR instruction 85
single-byte near-return RET instruction (opcode C3h) 128
SSE 193, 355
SSE2 193, 355
stack
alignment considerations 122
store-to-load forwarding 20, 22, 100–103
String Instructions 167
string instructions 167
structure (struct) 41, 117, 119
subexpressions, explicitly extract common 37
superscalar processor 251
switch statement 25, 28, 33
U
unit-stride access 105, 110
W
write combining 113, 260, 263–264, 266
X
XOR instruction 169