C – Troubleshooting
Kernel and Initialization Issues
IB6054601-00 D C-11
Q
C.4.6
InfiniPath ib_ipath Initialization Failure
There may be cases where ib_ipath was not properly initialized. Symptoms of this
may show up in error messages from an MPI job or another program. Here is a
sample command and error message:
$ mpirun -np 2 -m ~/tmp/mbu13 osu_latency
<nodename>:The link is down
MPIRUN: Node program unexpectedly quit. Exiting.
First, check to be sure that the InfiniPath driver is loaded:
$ lsmod | grep ib_ipath
If no output is displayed, the driver did not load for some reason. Try the commands
(as root):
# modprobe -v ib_ipath
# lsmod | grep ib_ipath
# dmesg | grep ipath | tail -25
This will indicate whether the driver has loaded. Printing out messages using dmesg
may help to locate any problems with
ib_ipath.
If the driver loaded, but MPI or other programs are not working, check to see if
problems were detected during the driver and InfiniPath hardware initialization with
the command:
$ dmesg | grep -i ipath
This may generate more than one screen of output. Also, check the link status with
the commands:
$ cat /sys/bus/pci/driver/ib_ipath/0?/status_str
These commands are normally executed by the ipathbug-helper script, but
running them separately may help locate the problem.
Refer also to appendix C.9.16 and appendix C.9.8.
C.4.7
MPI Job Failures Due to Initialization Problems
If one or more nodes do not have the interconnect in a usable state, messages
similar to the following will occur when the MPI program is started:
userinit: userinit ioctl failed: Network is down [1]: device init
failed
userinit: userinit ioctl failed: Fatal Error in keypriv.c(520):
device init failed
This could indicate that a cable is not connected, the switch is down, SM is not
running, or a hardware error has occurred.