Open MPI logo

FAQ:
Running jobs under Torque / PBS Pro

  |   Home   |   Support   |   FAQ   |   all just the FAQ

Table of contents:

  1. How do I run jobs under Torque / PBS Pro?
  2. Does Open MPI support Open PBS?
  3. How does Open MPI get the list of hosts from Torque / PBS Pro?
  4. What happens if $PBS_NODEFILE is modified?
  5. Can I specify a hostfile or use the --host option to mpirun when running in a Torque / PBS environment?


1. How do I run jobs under Torque / PBS Pro?

The short answer is just to use mpirun as normal.

Open MPI automatically obtains both the list of hosts and how many processes to start on each host from Torque / PBS Pro directly. Hence, it is unnecessary to specify the --hostfile, --host, or -np options to mpirun. Open MPI will use PBS/Torque-native mechanisms to launch and kill processes ([rsh] and/or ssh are not required).

For example:

# Allocate a PBS job with 4 nodes
shell$ qsub -I -lnodes=4
# Now run an Open MPI job on all the nodes allocated by PBS/Torque
# (starting with Open MPI v1.2; you need to specify -np for the 1.0
# and 1.1 series).
shell$ mpirun my_mpi_application

This will run the 4 MPI processes on the nodes that were allocated by PBS/Torque. Or, if submitting a script:

shell$ cat my_script.sh
#!/bin/sh
mpirun my_mpi_application
shell$ qsub -l nodes=4 my_script.sh


2. Does Open MPI support Open PBS?

As of this writing, Open PBS is so ancient that we are not aware of any sites running it. As such, we have never tested Open MPI with Open PBS and therefore do not know if it would work or not.


3. How does Open MPI get the list of hosts from Torque / PBS Pro?

Open MPI has changed how it obtains hosts from Torque / PBS Pro over time:

  • v1.0 and v1.1 series: The list of hosts allocated to a Torque / PBS Pro job is obtained directly from the scheduler using the internal TM API.
  • v1.2 series: Due to scalability limitations in how the TM API was used in the v1.0 and v1.1 series, Open MPI was modified to read the $PBS_NODEFILE to obtain hostnames. Specifically, reading the $PBS_NODEFILE is much faster at scale than how the v1.0 and v1.1 series used the TM API.

It is possible that future versions of Open MPI may switch back to using the TM API in a more scalable fashion, but there isn't currently a huge demand for it (reading the $PBS_NODEFILE works just fine).

Note that the TM API is used to launch processes in all versions of Open MPI; the only thing that has changed over time is how Open MPI obtains hostnames.


4. What happens if $PBS_NODEFILE is modified?

Bad Things will happen.

We've had reports from some sites that system administrators modify the $PBS_NODEFILE in each job according to local policies. This will currently cause Open MPI to behave in an unpredictable fashion. As long as no new hosts are added to the hostfile, it usually means that Open MPI will incorrectly map processes to hosts, but in some cases it can cause Open MPI to fail to launch processes altogether.

The best course of action is to not modify the $PBS_NODEFILE.


5. Can I specify a hostfile or use the --host option to mpirun when running in a Torque / PBS environment?

As of version v1.2.1, no.

Open MPI will fail to launch processes properly when a hostfile is specifed on the mpirun command line, or if the mpirun [--host] option is used.

We're working on correcting the error. A future version of Open MPI will likely launch on the hosts specified either in the hostfile or via the --host option as long as they are a proper subset of the hosts allocated to the Torque / PBS Pro job.