IB6054601-00 D A-1
Appendix A
Benchmark Programs
Several MPI performance measurement programs are installed from the
mpi-benchmark RPM. This Appendix describes these useful benchmarks and how
to run them. These programs are based on code from the group of Dr. Dhabaleswar
K. Panda at the Network-Based Computing Laboratory at the Ohio State University.
For more information, see:
http://nowlab.cis.ohio-state.edu/
These programs allow you to measure the MPI latency and bandwidth between two
or more nodes in your cluster. Both the executables, and the source for those
executables, are shipped. The executables are shipped in the
mpi-benchmark
RPM, and installed under
/usr/bin. The source is shipped in the mpi-devel RPM
and installed under
/usr/share/mpich/examples/performance.
The examples given below are intended only to show the syntax for invoking these
programs and the meaning of the output. They are NOT representations of actual
InfiniPath performance characteristics.
A.1
Benchmark 1: Measuring MPI Latency Between Two Nodes
In the MPI community, latency for a message of given size is defined to be the time
difference between a node program’s calling
MPI_Send and the time that the
corresponding
MPI_Recv in the receiving node program returns. By latency, alone
without a qualifying message size, we mean the latency for a message of size zero.
This latency represents the minimum overhead for sending messages, due both to
software overhead and to delays in the electronics of the fabric. To simplify the
timing measurement, latencies are usually measured with a ping-pong method,
timing a round-trip and dividing by two.
The program
osu_latency, from Ohio State University, measures the latency for a
range of messages sizes from 0 to 4 megabytes. It uses a ping-pong method, in
which the rank 0 process initiates a series of sends and the rank 1 process echoes
them back, using the blocking MPI send and receive calls for all operations. Half
the time interval observed by the rank 0 process for each such exchange is a
measure of the latency for messages of that size, as defined above. The program
uses a loop, executing many such exchanges for each message size, in order to
get an average. It defers the timing until the message has been sent and received
a number of times, in order to be sure that all the caches in the pipeline have been
filled.