Intel SGI Altix 450 Switch User Manual


 
74 007-4857-002
3: System Overview
Non-uniform Memory Access (NUMA)
In DSM systems, memory is physically located at various distances from the processors. As a
result, memory access times (latencies) are different or “non-uniform.” For example, it takes less
time for a processor blade to reference its locally installed memory than to reference remote
memory.
Reliability, Availability, and Serviceability (RAS)
The Altix 450 server series components have the following features to increase the reliability,
availability, and serviceability (RAS) of the systems.
Power and cooling:
IRU power supplies are redundant and can be hot-swapped under most circumstances.
Note that this might not be possible in a “fully loaded” system. If all the blade positions
are filled, be sure to consult with a service technician before removing a power supply
while the system is running.
IRUs have overcurrent protection at the blade and power supply level.
Fans are redundant and can be hot-swapped.
Fans run at multiple speeds in the IRUs. Speed increases automatically when
temperature increases or when a single fan fails.
System monitoring:
System controllers monitor the internal power and temperature of the IRUs, and can
automatically shut down an enclosure to prevent overheating.
Memory, L2 cache, L3 cache, and all external bus transfers are protected by single-bit
error correction and double-bit error detection (SECDED).
The NUMAlink interconnect network is protected by cyclic redundancy check (CRC).
The L1 primary cache is protected by parity.
Each IRU and each blade/node installed has failure LEDs that indicate the failed part;
LEDs are readable at the front of the IRU or via the system controllers.
Systems support the optional Embedded Support Partner (ESP), a tool that monitors the
system; when a condition occurs that may cause a failure, ESP notifies the appropriate
SGI personnel.
Systems support remote console and maintenance activities.