The Partition Manager
4.3 The Machine Manager
The Machine Manager, mmanager, is responsible for detecting and reporting changes in
the state of each node in the system. It records the current state of each node and any
changes in state in the database.
When a node is functioning correctly, rmsd, a daemon which runs on each node,
periodically updates the database. However, if the node crashes, or IP traffic to and from
the node stops, then these updates stop. RMS uses the external monitor, mmanager, to
check periodically the service level of each node. It monitors whether IP is functioning
and whether the RMS daemons on each node are operating.
4.3.1 Interaction with the Database
The Machine Manager records the current status of nodes in the nodes table (see
Section 10.2.14) while changes to node status are entered in the events table (see
Section 10.2.6).
The interval at which the Machine Manager performs status checks is set in the
attributes table (see Section 10.2.3) with the node-status-poll-interval
attribute. If this attribute is not present, the general attribute rms-poll-interval is
used instead.
4.4 The Partition Manager
The nodes in the RMS machine are configured into mutually exclusive sets known as
partitions (see Section 2.4). By restricting access to partitions, the system administrator
can reserve particular partitions for specific types of tasks or users. In this way, the
system administrator can ensure that resources are used most effectively; for example,
that resources intended for running parallel programs are not consumed running user
shells. The access restrictions are set up in the access_controls table (see
Section 10.2.1) of the RMS database.
Each partition is controlled by a Partition Manager, pmanager. The Partition Manager
mediates each user’s requests for resources (CPUs and memory) to run jobs in the
partition. It checks the user’s access permissions and resource limits before adding the
request to its scheduling queue. The request blocks until the resources are allocated for
the job.
When the resources requested by the user become available, the Partition Manager
instructs rmsd, a daemon that runs on each node in the partition (see Section 4.9), to
create a communications context for the user’s job. Finally, the Partition Manager
replies to the user’s request and the user’s job starts.
RMS Daemons 4-3