AMD 250 Computer Hardware User Manual


 
78 Instruction-Decoding Optimizations Chapter 4
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
4.5 Take Advantage of x86 and AMD64 Complex
Addressing Modes
Optimization
When porting from other architectures, or, perhaps, if you are just new to x86 assembly language,
remember that the x86 architecture provides many complex addressing modes. By building the
effective address in one instruction, the instruction count can sometimes be reduced, leading to better
code density and greater decode bandwidth. Refer to the the section on effective addresses in the
AMD64 Architecture Programmer's Manual Volume 1: Application Programming for more detailed
information on how effective addresses are formed.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
When building the effective address you sometimes seem to require numerous instructions when there
is a base address (such as the base of an array) an index and perhaps a displacement. But x86
architecture can often handle all of this in one instruction. This can lead to reduced code size and
fewer instructions to decode. As always, attention should be paid to total instruction length, latencies
and whether or not the instruction choices are DirectPath (fastest) or VectorPath (slower).
Example
This first instruction sequence of 5 instructions and a total latency count of 8 can be replaced by one
instruction.
The following instruction replaces the functionality of the above sequence.
Number of Bytes Latency Instruction
31 movl %r10d,%r11d
leaq 0x68E35,rcx
addq %rcx,%r11
movb (%r11,%r13),%cl
cmpb %al,%cl
82
31
53
21