Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

Optimizing for SIMD Integer Applications 4

4-41

aligned versions; this can reduce the performance gains when using

the 128-bit SIMD integer extensions. The general guidelines on the

alignment of memory operands are:

— The greatest performance gains can be achieved when all

memory streams are 16-byte aligned.

— Reasonable performance gains are possible if roughly half of all

memory streams are 16-byte aligned, and the other half are not.

— Little or no performance gain may result if all memory streams

are not aligned to 16-bytes; in this case, use of the 64-bit SIMD

integer instructions may be preferable.

• Loop counters need to be updated because each 128-bit integer

instruction operates on twice the amount of data as the 64-bit integer

counterpart.

• Extension of the pshufw instruction (shuffle word across 64-bit

integer operand) across a full 128-bit operand is emulated by a

combination of the following instructions:

pshufhw, pshuflw,

pshufd.

• Use of the 64-bit shift by bit instructions (psrlq, psllq) are

extended to 128 bits in these ways:

—use of

psrlq and psllq, along with masking logic operations

— code sequence is rewritten to use the

psrldq and pslldq

instructions (shift double quad-word operand by bytes).

SIMD Optimizations and Microarchitectures

Pentium M, Intel Core Solo and Intel Core Duo processors have a

different microarchitecture than Intel NetBurst

®

microarchitecture. The

following sections discuss optimizing SIMD code that targets Intel Core

Solo and Intel Core Duo processors.

On Intel Core Solo and Intel Core Duo processors, lddqu behaves

identically to movdqu by loading 16 bytes of data irrespective of

address alignment.

previous next