IA-32 Intel® Architecture Optimization
4-28
The code above converts values to unsigned numbers first and then clips
them to an unsigned range. The last instruction converts the data back to
signed data and places the data within the signed range. Conversion to
unsigned data is required for correct results when (
high - low) <
0x8000.
If (
high - low) >= 0x8000, the algorithm can be simplified as shown in
Example 4-21.
This algorithm saves a cycle when it is known that (
high - low) >=
0x8000. The three-instruction algorithm does not work when (high -
low) < 0x8000, because 0xffff minus any number < 0x8000 will yield
a number greater in magnitude than
0x8000, which is a negative
number. When the second instruction,
psubssw MM0, (0xffff - high
+ low)
, in the three-step algorithm (Example 4-21) is executed, a
negative number is subtracted. The result of this subtraction causes the
values in
MM0 to be increased instead of decreased, as should be the case,
and an incorrect answer is generated.
Clipping to an Arbitrary Unsigned Range [high, low]
Example 4-22 clips an unsigned value to the unsigned range [high,
low
]. If the value is less than low or greater than high, then clip to low
or
high, respectively. This technique uses the packed-add and
Example 4-21 Simplified Clipping to an Arbitrary Signed Range
; Input: MM0 signed source operands
; Output: MM1 signed operands clipped to the unsigned
; range [high, low]
paddssw MM0, (packed_max - packed_high)
; in effect this clips to high
psubssw MM0, (packed_usmax - packed_high + packed_ow)
; clips to low
paddw MM0, low ; undo the previous two offsets