8-22 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.2.5 Strengthening or Weakening the Memory-Ordering Model
The Intel 64 and IA-32 architectures provide several mechanisms for strengthening
or weakening the memory-ordering model to handle special programming situations.
These mechanisms include:
• The I/O instructions, locking instructions, the LOCK prefix, and serializing
instructions force stronger ordering on the processor.
• The SFENCE instruction (introduced to the IA-32 architecture in the Pentium III
processor) and the LFENCE and MFENCE instructions (introduced in the Pentium
4 processor) provide memory-ordering and serialization capabilities for specific
types of memory operations.
• The memory type range registers (MTRRs) can be used to strengthen or weaken
memory ordering for specific area of physical memory (see Section 11.11,
“Memory Type Range Registers (MTRRs)”). MTRRs are available only in the
Pentium 4, Intel Xeon, and P6 family processors.
• The page attribute table (PAT) can be used to strengthen memory ordering for a
specific page or group of pages (see
Section 11.12, “Page Attribute Table (PAT)”).
The PAT is available only in the Pentium 4, Intel Xeon, and Pentium III processors.
These mechanisms can be used as follows:
Memory mapped devices and other I/O devices on the bus are often sensitive to the
order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT
instructions) impose strong write ordering on such accesses as follows. Prior to
executing an I/O instruction, the processor waits for all previous instructions in the
program to complete and for all buffered writes to drain to memory. Only instruction
fetch and page tables walks can pass I/O instructions. Execution of subsequent
instructions do not begin until the processor determines that the I/O instruction has
been completed.
Synchronization mechanisms in multiple-processor systems may depend upon a
strong memory-ordering model. Here, a program can use a locking instruction such
Example 8-15. String Operations Are not Reordered with Earlier Stores
Processor 0 Processor 1
mov [_z], $1 mov r1, [ _y]
rep:stosd [ _x] mov r2, [ _z]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory
location
r1 == 1 and r2 == 0 is not allowed