Optimizing for SIMD Integer Applications 4
4-33
Note that the output is a packed doubleword. If needed, a pack
instruction can be used to convert the result to 16-bit (thereby matching
the format of the input).
Packed 32*32 Multiply
The PMULUDQ instruction performs an unsigned multiply on the lower
pair of double-word operands within each 64-bit chunk from the two
sources; the full 64-bit result from each multiplication is returned to the
destination register. This instruction is added in both a 64-bit and
128-bit version; the latter performs 2 independent operations, on the low
and high halves of a 128-bit register.
Packed 64-bit Add/Subtract
The PADDQ/PSUBQ instructions add/subtract quad-word operands within
each 64-bit chunk from the two sources; the 64-bit result from each
computation is written to the destination register. Like the integer
ADD/SUB instruction, PADDQ/PSUBQ can operate on either unsigned or
signed (two’s complement notation) integer operands. When an
individual result is too large to be represented in 64-bits, the lower
64-bits of the result are written to the destination operand and therefore
the result wraps around. These instructions are added in both a 64-bit
and 128-bit version; the latter performs 2 independent operations, on the
low and high halves of a 128-bit register.
128-bit Shifts
The pslldq/psrldq instructions shift the first operand to the left/right
by the amount of bytes specified by the immediate operand. The empty
low/high-order bytes are cleared (set to zero). If the value specified by
the immediate operand is greater than 15, then the destination is set to
all zeros.