Compaq SC RMS Server User Manual


 
Link Errors
network, data can be broadcast directly to a contiguous range of processors: data is
routed up to a node in the tree from which all processors can be reached; then the data is
routed down to all switch outputs in the broadcast range on the way down. Data can be
recombined as it travels through the network to support global reduction operations and
barrier synchronization.
Multiple Elan network adapters may be installed per node, each connected to a different
switch network. This replication can increase fault tolerance and bisectional bandwidth,
assuming each Elan is attached to a separate PCI bus. Each separate Elan/Elite network
attached to a node is known as a layer (or a rail).
The switch network is described by three tables in the database. The switch_boards
table (see Section 10.2.22) gives details of each board, its status and its position in the
machine. The elans table (see Section 10.2.4) and the elites table (see Section 10.2.5)
describe the position in the switch network of each component, its attributes and its
current link state and errors.
RMS includes the control and monitoring daemon, swmgr (see Section 4.5), for managing
the switch network. swmgr probes the switch network control interface for switch boards
to determine the size of the network. It then creates or updates the entries in the elans
table and the elites table. Having done this, the swmgr uses the switch network
control interface to extract error and performance data. This interface is also used for
link continuity (boundary scan) testing.
A.2 Link States
The state of each link in the switch network is recorded in the linkerrors field in the
elites table (see Section 10.2.5). Valid values for the states are shown in Section B.4.
Links are normally in the connected state (C). Unconnected links will be in the reset
state (R). Links will be in the unknown state (U) if the swmgr has not run or if the control
cable is not attached to the switch. The states acking (A) and nacking (N) are set by the
switch control software.
A.3 Link Errors
The swmgr logs network errors to the link_errors table (see Section 10.2.11). The
description contains information that should be used in reporting a problem with the
switch network.
A-4 Compaq AlphaServer SC Interconnect Terms