3
Parallel Programs Under RMS
3.1 Introduction
RMS provides users with tools for running parallel programs and monitoring their
execution, as described in Chapter 5 (RMS Commands). Users can determine what
resources are available to them and request allocation of the CPUs and memory required
to run their programs. This chapter describes the structure of parallel programs under
RMS and how they are run.
A parallel program consists of a controlling process, prun, and a number of application
processes distributed over one or more nodes. Each process may have multiple threads
running on one or more CPUs. prun can run on any node in the system but it normally
runs in a login partition or on an interactive node.
In a system with SMP nodes, RMS can allocate CPUs so as to use all of the CPUs on the
minimum number of nodes (a block distribution); alternatively, it can allocate a specified
number of CPUs on each node (a cyclic distribution). This flexibility allows users to
choose between the competing benefits of increased CPU count and memory size on each
node (generally good for multithreaded applications) and increased numbers of nodes
(generally best for applications requiring increased total memory size, memory
bandwidth and I/O bandwidth).
Parallel programs can be written so that they will run with varying numbers of CPUs
and varying numbers of CPUs per node. They can, for example, query the number of
processors allocated and determine their data distributions and communications
patterns accordingly (see Appendix C (RMS Kernel Module) for details).
Parallel Programs Under RMS 3-1