AMD 250 Computer Hardware User Manual

Open as PDF

of 384

214 Optimizing with SIMD Instructions Chapter 9

25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

general, unrolling loops improves performance by providing opportunities for the processor to work

on data pertaining to the next loop iteration while waiting for the result of an operation from the

previous iteration. The reciprocal_sqrt_1xloop loop performs the reciprocation and square root

on the remaining elements that do not form a full segment of 16 floating-point values. In this chapter,

the previous function is the only example that handles any vector stream of num_points size. This is

done to preserve space, but all examples in this chapter can be modified in a similar manner and used

universally.

Additionally, the previous SSE function makes use of the PREFETCHNTA instruction to reduce

cache latency. The unrolled loop reciprocal_sqrt_4xloop was chosen to work with 64 bytes of

data per iteration, which happens to be the size of one cache line (the term used to signify the

quantum of data brought into the processor’s cache by a memory access, if the data does not reside

there already). The prefetch causes the processor to load the floating-point operands of the reciprocal

and square root operations for the next four loop iterations. While the processor works on the next

three iterations, the data for the fourth iteration is sent to the processor. The processor does not have to

wait while the aligned SSE instruction MOVAPS is fetched from memory before performing

operations on the fourth iteration. This type of memory optimization can be very useful in gaming and

high-performance computing, in which data sets are unlikely to reside in the processor’s cache. For

example, in a simulation involving a million vertices or atoms in which the storage for their

coordinates would require 12 bytes per vertex, the total space for the data would be more than 12

Mbytes.

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

AMD 250 Computer Hardware User Manual