IA-32 Intel® Architecture Optimization
6-52
Performance Comparisons of Memory Copy Routines
The throughput of a large-region, memory copy routine depends on
several factors:
• coding techniques that implements the memory copy task
• characteristics of the system bus (speed, peak bandwidth, overhead
in read/write transaction protocols)
• microarchitecture of the processor
A comparison of the two coding techniques discussed above and two
un-optimized techniques is shown in Table 6-2.
add esi,ecx
add edi,ecx
sub edx,ecx
jnz main_loop
sfence
}
}
Table 6-2 Relative Performance of Memory Copy Routines
Processor, CPUID
Signature and
FSB Speed
Byte
Sequential
DWORD
Sequential
SW prefetch +
8 byte
streaming
store
4KB-Block
HW prefetch
+ 16 byte
streaming
stores
Pentium M processor,
0x6Dn, 400
1.3X 1.2X 1.6X 2.5X
Intel Core Solo and
Intel Core Duo
processors, 0x6En,
667
3.3X 3.5X 2.1X 4.7X
Pentium D processor,
0xF4n, 800
3.4X 3.3X 4.9X 5.7X