Chapter 4 Instruction-Decoding Optimizations 81
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
4.7 Partial-Register Reads and Writes
Optimization
Avoid partial register reads and writes.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
In order to handle partial register writes, the processor’s execution core implements a data merging
scheme.
In the execution unit, an instruction that writes part of a register merges the modified portion with the
current state of the other part of the register. Therefore, the dependency hardware can potentially
force a false dependency on the most recent instruction that writes to any part of the register.
In addition, an instruction that has a read dependency on any part of a given architectural register has
a read dependency on the most recent instruction that modifies any part of the same architectural
register.
Example 1
Avoid code such as the following, which writes to only part of a register:
mov al, 10 ; Instruction 1
mov ah, 12 ; Instruction 2 has a false dependency on instruction 1.
; Instruction 2 merges new AH with current EAX register
; value forwarded by instruction 1.
Example 2
Avoid code such as the following, which both reads and writes only parts of registers:
mov bx, 12h ; Instruction 1
mov bl, dl ; Instruction 2 has a false dependency on the completion
; of instruction 1.
mov bh, cl ; Instruction 3 has a false dependency on the completion
; of instruction 2.
mov al, bl ; Instruction 4 depends on the completion of instruction 2.