Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-98
User/Source Coding Rule 8. (H impact, H generality) To achieve effective
amortization of bus latency, software should.pay attention to favor data
access patterns that result in higher concentrations of cache miss patterns
with cache miss strides that are significantly smaller than half of the hardware
prefetch trigger threshold. 2-52
User/Source Coding Rule 9. (M impact, H generality) Enable the prefetch
generation in your compile. Note: As the compiler’s prefetch implementation
improves, it is expected that its prefetch insertion will outperform manual
insertion except for code tuning experts, but this is not always the case. If the
compiler does not support software prefetching, intrinsics or inline assembly
may be used to manually insert prefetch instructions. 2-56
User/Source Coding Rule 10. (M impact, M generality) Enable the
compiler’s use of SSE, SSE2 and/or SSE3 instructions with appropriate
switches. 2-58
User/Source Coding Rule 11. (H impact, ML generality) Make sure your
application stays in range to avoid denormal values, underflows. 2-58
User/Source Coding Rule 12. (M impact, ML generality) Do not use double
precision unless necessary. Set the precision control (PC) field in the x87 FPU
control word to “Single Precision”. This allows single precision (32-bit)
computation to complete faster on some operations (for example, divides due
to early out). However, be careful of introducing more than a total of two
values for the floating point control word, or there will be a large performance
penalty. See “Floating-point Modes”. 2-58
User/Source Coding Rule 13. (H impact, ML generality) Use fast
float-to-int routines, FISTTP or SSE2 instructions. If coding these routines, use
FISTTP is SSE3 is available, or the
cvttss2si, cvttsd2si instructions if
coding with Streaming SIMD Extensions 2. 2-59
User/Source Coding Rule 14. (M impact, ML generality) Break dependence
chains where possible. 2-59
User/Source Coding Rule 15. (M impact, ML generality) Usually, math
libraries take advantage of the transcendental instructions (for example,
fsin) when evaluating elementary functions. If there is no critical need to
evaluate the transcendental functions using the extended precision of 80 bits,
applications should consider alternate, software-based approach, such as