Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
Index-2
coding methodologies, 3-13
coding techniques, 3-12
absolute difference of signed numbers, 4-24
absolute difference of unsigned numbers,
4-23
absolute value, 4-25
clipping to an arbitrary signed range, 4-26
clipping to an arbitrary unsigned range,
4-28
generating constants, 4-21
interleaved pack with saturation, 4-8
interleaved pack without saturation, 4-10
non-interleaved unpack, 4-11
signed unpack, 4-7
simplified clipping to an arbitrary signed
range, 4-28
unsigned unpack, 4-6
coherent requests, 6-13
command-line options, A-2
automatic processor dispatch support, A-4
floating-point arithmetic precision, A-6
inline expansion of library functions, A-6
loop unrolling, A-5
rounding control, A-6
targeting a processor, A-3
vectorizer switch, A-5
comparing register values, 2-87
compiler intrinsics
_mm_load, 6-2, 6-44
_mm_prefetch, 6-2, 6-44
_mm_stream, 6-2, 6-44
compiler plug-in, A-2
compiler-supported alignment, 3-24
complex instructions, 2-74
computation latency, E-8
computation-intensive code, 3-11
compute bound, E-7, E-8
converting code to MMX technology, 3-8
CPUID instruction, 3-2
C-states, 9-1, 9-4
D
Data
Code segment and, 2-47
data alignment, 3-20
data arrangement, 5-4
data copy, E-11
data deswizzling, 5-14, 5-15
data prefetching, 1-33
Data structures
Access pattern versus alignment, 2-40
Aligning, 2-39
data swizzling, 5-9
data swizzling using intrinsics, 5-12
decoupled memory, E-7
deeper sleep, 9-6
divide instructions, 2-76
E
eliminating branches, 2-15, 2-18
EMMS instruction, 4-3, 4-4
extract word instruction, 4-13
F
fist instruction, 2-64
fldcw instruction, 2-64
floating-point applications, 2-57
floating-point arithmetic precision options, A-6
floating-point code
improving parallelism, 2-68
loop unrolling, 2-26
memory access stall information, 2-37
memory operands, 2-71
operations with integer operands, 2-72
optimizing, 2-58
transcendental functions, 2-72
floating-point operations with integer operands,
2-72