Q-Logic IB6054601-00 D Switch User Manual


 
C – Troubleshooting
InfiniPath MPI Troubleshooting
C-28 IB6054601-00 D
Q
C.8.13
MPI Stats
Using the -print-stats option to mpirun will result in a listing to stderr of various
MPI statistics. Here is example output for the
-print-stats option when used with
an 8-rank run of the HPCC benchmark.
Message statistics are available for transmitted and received messages. In all
cases, the MPI rank number responsible for a minimum or maximum value is
reported with the relevant value. For application runs of at least 3 ranks, a median
is also available.
Since transmitted messages employ either an Eager or a Rendezvous protocol,
results are available relative to both message count and aggregated bytes. Message
count represents the amount of messages transmitted by each protocol on a
per-rank basis. Aggregated amounts of message bytes indicate the total amount of
data that was moved on each rank by a particular protocol.
On the receive side, messages are split into either expected or unexpected
messages. Unexpected messages cause the MPI implementation to buffer the
transmitted data until the receiver is able to produce a matching MPI receive buffer.
Expected messages refer to the inverse case, which should be the common case
in most MPI applications. An additional metric, Unexpected count %, representing
the proportion of unexpected messages in relation to the total number of messages
received is also shown because of the notable effect unexpected messages have
on performance.
For more precise information, users are encouraged to make use of MPI profilers
such as mpiP. For more information on mpiP, see:
http://www.llnl.gov/CASC/mpip/
For reference on the HPCC benchmark, see:
http://icl.cs.utk.edu/hpcc/
MPIRUN: MPI Statistics Summary (min, max, median @ rank)
MPIRUN: Messages sent
MPIRUN: Eager count (min=652.54K @ 0, max=653.39K @ 7, med= 653.15K)
MPIRUN: Eager aggregate bytes (min= 2.08G @ 0, max= 2.08G @ 2, med= 2.08G)
MPIRUN: Rendezvous count (None)
MPIRUN: Rendezvous agg. bytes (None)
MPIRUN:
MPIRUN: Messages received
MPIRUN: Expected count (min=590.48K @ 2, max=624.90K @ 6, med= 619.01K)
MPIRUN: Expected aggregate bytes (min= 2.03G @ 2, max= 2.04G @ 1, med= 2.04G)
MPIRUN: Unexpected count (min= 27.89K @ 6, max= 62.69K @ 2, med= 39.20K)
MPIRUN: Unexpected agg. bytes (min= 44.57M @ 1, max= 57.95M @ 2, med= 48.04M)
MPIRUN: Unexpected count % (min= 4% @ 6, max= 9% @ 2, med= 6%)