xvii
Example 4-20 Clipping to an Arbitrary Signed Range [high, low].........................4-27
Example 4-21 Simplified Clipping to an Arbitrary Signed Range .........................4-28
Example 4-22 Clipping to an Arbitrary Unsigned Range [high, low].....................4-29
Example 4-23 Complex Multiply by a Constant....................................................4-32
Example 4-24 A Large Load after a Series of Small Stores (Penalty)..................4-35
Example 4-25 Accessing Data without Delay.......................................................4-35
Example 4-26 A Series of Small Loads after a Large Store.................................4-36
Example 4-27 Eliminating Delay for a Series of Small Loads after a
Large Store....................................................................................4-36
Example 4-28 An Example of Video Processing with Cache Line Splits..............4-37
Example 4-29 Video Processing Using LDDQU to Avoid Cache Line Splits........4-38
Example 5-1 Pseudocode for Horizontal (xyz, AoS) Computation .......................5-8
Example 5-2 Pseudocode for Vertical (xxxx, yyyy, zzzz, SoA) Computation........5-9
Example 5-3 Swizzling Data...............................................................................5-10
Example 5-4 Swizzling Data Using Intrinsics .....................................................5-12
Example 5-5 Deswizzling Single-Precision SIMD Data......................................5-14
Example 5-6 Deswizzling Data Using the movlhps and shuffle
Instructions....................................................................................5-15
Example 5-7 Deswizzling Data 64-bit Integer SIMD Data ..................................5-16
Example 5-8 Using MMX Technology Code for Copying or Shuffling.................5-18
Example 5-9 Horizontal Add Using movhlps/movlhps........................................5-19
Example 5-10 Horizontal Add Using Intrinsics with movhlps/movlhps .................5-21
Example 5-11 Multiplication of Two Pair of Single-precision Complex Number....5-24
Example 5-12 Division of Two Pair of Single-precision Complex Number............5-25
Example 5-13 Calculating Dot Products from AOS ..............................................5-26
Example 6-1 Pseudo-code for Using cflush .......................................................6-18
Example 6-2 Populating an Array for Circular Pointer Chasing with
Constant Stride..............................................................................6-21
Example 6-3 Prefetch Scheduling Distance .......................................................6-26
Example 6-5 Concatenation and Unrolling the Last Iteration of Inner Loop .......6-28
Example 6-4 Using Prefetch Concatenation.......................................................6-28
Example 6-6 Spread Prefetch Instructions .........................................................6-33
Example 6-7 Data Access of a 3D Geometry Engine without Strip-mining........6-37
Example 6-8 Data Access of a 3D Geometry Engine with Strip-mining.............6-38
Example 6-9 Using HW Prefetch to Improve Read-Once Memory Traffic..........6-40
Example 6-10 Basic Algorithm of a Simple Memory Copy...................................6-46
Example 6-11 A Memory Copy Routine Using Software Prefetch........................6-48