Q-Logic IB6054601-00 D Switch User Manual


 
C – Troubleshooting
InfiniPath MPI Troubleshooting
C-26 IB6054601-00 D
Q
There is no route to any host:
$ mpirun -np 2 -m ~/tmp/q mpi_latency 100 100
ssh: connect to host <nodename> port 22: No route to host
ssh: connect to host <nodename> port 22: No route to host
MPIRUN: All node programs ended prematurely without connecting to
mpirun.
Node jobs have started, but one host couldn’t connect back to mpirun:
$ mpirun -np 2 -m ~/tmp/q mpi_latency 100 100
9139.psc_skt_connect: Error connecting to socket: No route to host
<nodename> Cannot connect to mpirun within 60 seconds.
MPIRUN: Some node programs ended prematurely without connecting to
mpirun.
MPIRUN: No connection received from 1 node process on node
<nodename>
Node jobs have started, both hosts couldn’t connect back to mpirun:
$ mpirun -np 2 -m ~/tmp/q mpi_latency 100 100
9158.psc_skt_connect: Error connecting to socket: No route to host
<nodename> Cannot connect to mpirun within 60 seconds.
6083.psc_skt_connect: Error connecting to socket: No route to host
<nodename> Cannot connect to mpirun within 60 seconds.
MPIRUN: All node programs ended prematurely without connecting to
mpirun.
$ mpirun -np 2 -m ~/tmp/q mpi_latency 1000000 1000000
MPIRUN: <nodename> node program unexpectedly quit: Exiting.
One program on one node died:
$ mpirun -np 2 -m ~/tmp/q mpi_latency 100000 1000000
MPIRUN: <nodename> node program unexpectedly quit: Exiting.
The quiescence detected message is printed when an MPI job does not seem
to be making progress. The default timeout is 900 seconds. After this length of time
all the node processes will be terminated. This timeout can be extended or disabled
with the
-quiescence-timeout option in mpirun.