0031-307 0031-311
0031-307 remote child: error restoring stdin.
Explanation: The previously closed stdin cannot be restored.
User Response: Probable system error. Gather information about the problem and follow
local site procedures for reporting hardware and software problems.
0031-308 Invalid value for
string
:
string
Explanation: Indicated value is not a valid setting for the indicated environment variable or
command line option.
User Response: Set to a valid value and rerun.
0031-309 Connect failed during message passing initialization, task
number
, reason:
string
Explanation: The Communication Subsystem was unable to connect this task to one or
more other tasks in the current partition for the reason given.
User Response: If a timeout has occurred, the MP_TIMEOUT environment variable is set
to too low of a value. (The default value is 150 seconds.) If you have not explicitly set the
MP_TIMEOUT environment variable and the program being run under POE is NFS mounted,
150 seconds may not be sufficient.
If the reason given indicates "Permission denied", you should ensure the login name and
user ID of the user submitting the job is consistent on all nodes on which the job is running.
If the reason given indicates "Permission denied" or "Not owner" and the job was submitted
under LoadLeveler, you should ensure that the adapter requirement given to LoadLeveler is
compatible with the MP_EUILIB environment variable.
If the reason given indicates "No such device", the Communication Subsystem library
(libmpci.a) bound into the executable does not match the switch adapter for that node. This
error usually occurs when the executable was statically bound on a system that was
configured for a different switch adapter. For example, a program that was compiled on a
system configured with a TB2 adapter, and was then attempted to be run on a system with a
TB3 adapter. In this case, you should recompile the program on a system configured for the
same switch adapter as that of node where the executable will be run.
For any other reason, an internal error has occurred. You should gather information about
the problem and follow local site procedures for reporting hardware and software problems.
0031-310 Socket open failed during message passing initialization, task
number
,
reason:
string
Explanation: The Communication Subsystem was unable to open a socket for message
passing for the indicated task for the reason given.
User Response: If the reason given is “No buffer space available,” have the system
administrator raise the value of sb_max using the no command. The current suggested
value is 128000.
For any other reason, an internal error has most likely occurred. Gather information about
the problem and follow local site procedures for reporting hardware and software problems.
| 0031-311 Restart of program
string
failed. Return code is
number
.
| Explanation: The restart of the program indicated was unsuccessful.
| User Response: Check that the program name is valid, and that it was previously
| checkpointed.
Chapter 4. POE Messages 71