AMD 250 Computer Hardware User Manual


 
368 Index
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
REP string with low variable counts 168
unroll small loops 13
unrolling loops 145
M
memory
dynamic memory allocation 19
pushing memory data 157
MMX™ instructions
PANDN instruction 137
PREFETCHNTA/T0/T1/T2 instructions 105
MOVZX and MOVSX instructions 153
multiplication
by constant 164
multiplies over division, floating-point 238
muxing constructs 136
N
Nonuniform Memory Access 96
O
operands
largest possible operand size, repeated string 168
P
parallelism 35
PF2ID instructions 52
pointers
dereferenced arguments 44
use array-style code instead 10
population-count function 179
prefetch
determining distance 108
multiple 107
PREFETCH and PREFETCHW instructions 104, 106, 108
prototypes 29
R
recursive functions 132
register reads and writes, partial 81
REP prefix 168
S
scalar code translated into 3DNow! code 138
scheduling 144
SHLD instruction 85
SHR instruction 85
single-byte near-return RET instruction (opcode C3h) 128
SSE 193, 355
SSE2 193, 355
stack
alignment considerations 122
store-to-load forwarding 20, 22, 100103
String Instructions 167
string instructions 167
structure (struct) 41, 117, 119
subexpressions, explicitly extract common 37
superscalar processor 251
switch statement 25, 28, 33
U
unit-stride access 105, 110
W
write combining 113, 260, 263264, 266
X
XOR instruction 169