Intel IA-32 Computer Accessories User Manual


 
xix
Figures
Figure 1-1 Typical SIMD Operations ...................................................................1-3
Figure 1-2 SIMD Instruction Register Usage ......................................................1-4
Figure 1-3 The Intel NetBurst Microarchitecture ...............................................1-10
Figure 1-4 Execution Units and Ports in the Out-Of-Order Core.......................1-19
Figure 1-5 The Intel Pentium M Processor Microarchitecture...........................1-27
Figure 1-6 Hyper-Threading Technology on an SMP........................................1-35
Figure 1-7 Pentium D Processor, Pentium Processor Extreme Edition
and Intel Core Duo Processor .........................................................1-41
Figure 2-1 Cache Line Split in Accessing Elements in a Array.........................2-31
Figure 2-2 Size and Alignment Restrictions in Store Forwarding......................2-34
Figure 3-1 Converting to Streaming SIMD Extensions Chart .............................3-9
Figure 3-2 Hand-Coded Assembly and High-Level Compiler
Performance Trade-offs ...................................................................3-13
Figure 3-3 Loop Blocking Access Pattern.........................................................3-36
Figure 4-2 Interleaved Pack with Saturation .......................................................4-9
Figure 4-1 PACKSSDW mm, mm/mm64 Instruction Example ............................4-9
Figure 4-4 Result of Non-Interleaved Unpack High in MM1..............................4-12
Figure 4-3 Result of Non-Interleaved Unpack Low in MM0 ..............................4-12
Figure 4-5 pextrw Instruction ............................................................................4-14
Figure 4-6 pinsrw Instruction.............................................................................4-15
Figure 4-7 pmovmskb Instruction Example.......................................................4-17
Figure 4-8 pshuf Instruction Example ...............................................................4-18
Figure 4-9 PSADBW Instruction Example ........................................................4-31
Figure 5-1 Homogeneous Operation on Parallel Data Elements ........................5-5
Figure 5-2 Dot Product Operation.......................................................................5-8
Figure 5-3 Horizontal Add Using movhlps/movlhps ..........................................5-19
Figure 5-5 Horizontal Arithmetic Operation of the SSE3 Instruction
HADDPD .........................................................................................5-23
Figure 5-4 Asymmetric Arithmetic Operation of the SSE3 Instruction ..............5-23
Figure 6-1 Effective Latency Reduction as a Function of Access Stride...........6-22