Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

IA-32 Intel® Architecture Optimization

3-38

Note that this can be applied to both SIMD integer and SIMD

floating-point code.

If there are multiple consumers of an instance of a register, group the

consumers together as closely as possible. However, the consumers

should not be scheduled near the producer.

SIMD Optimizations and Microarchitectures

Pentium M, Intel Core Solo and Intel Core Duo processors have a

different microarchitecture than Intel NetBurst

®

microarchitecture. The

following sub-section discusses optimizing SIMD code targeting Intel

Core Solo and Intel Core Duo processors.

The register-register variant of the following instructions has improved

performance on Intel Core Solo and Intel Core Duo processor relative to

Pentium M processors. This is because the instructions consist of two

micro-ops instead of three. Relevant instructions are: unpcklps,

unpckhps, packsswb, packuswb, packssdw, pshufd, shuffps and shuffpd.

top_of_loop:

movq mm0, [A + eax]

pcmpgtw mm0, [B + eax]; Create compare mask

movq mm1, [D + eax]

pand mm1, mm0; Drop elements where A<B

pandn mm0, [E + eax] ; Drop elements where A>B

por mm0, mm1; Crete single word

movq [C + eax], mm0

add eax, 8

cmp eax, MAX_ELEMENT*2

jle top_of_loop

Example 3-21 Emulation of Conditional Moves (continued)

previous next