What's the relationship between Sun Grid Engine (SGE) process number and OpenMPI process number?
Categories:
Understanding the Relationship Between SGE and OpenMPI Process Numbers

Explore how Sun Grid Engine (SGE) job slots map to OpenMPI process ranks, and learn how to configure your MPI jobs for optimal performance on an SGE cluster.
When running parallel applications using OpenMPI on a Sun Grid Engine (SGE) cluster, a common point of confusion arises regarding the relationship between SGE's allocated process slots and OpenMPI's internal process numbering (ranks). Understanding this mapping is crucial for correctly configuring your MPI jobs, ensuring efficient resource utilization, and debugging performance issues. This article will clarify how SGE communicates resource allocations to OpenMPI and how OpenMPI interprets these to assign ranks to your parallel processes.
SGE's Role in Resource Allocation
Sun Grid Engine (SGE), now often referred to as Oracle Grid Engine or Open Grid Engine, is a workload management system that allocates computational resources (CPU cores, memory, etc.) to jobs submitted by users. For parallel jobs, SGE uses a concept called a 'Parallel Environment' (PE). When you submit an MPI job, you request a certain number of slots from a specific PE. SGE then finds available hosts and assigns the requested slots across them. The key information SGE provides to the job is a 'machine file' or 'host file' that lists the allocated hosts and the number of slots on each.
#!/bin/bash
#$ -N MyMPIJob
#$ -pe mpi 8
#$ -cwd
# The $PE_HOSTFILE environment variable points to the machine file
# OpenMPI typically reads this automatically or can be specified with -hostfile
mpirun -np $NSLOTS ./my_mpi_program
Example SGE job script for an OpenMPI job.
OpenMPI's Interpretation of Resources
OpenMPI's mpirun
(or orterun
) command is responsible for launching your parallel application across the allocated resources. When mpirun
starts, it reads the machine file provided by SGE (usually via the $PE_HOSTFILE
environment variable). This file tells mpirun
which hosts to use and how many processes (slots) it can launch on each host. Based on this information, mpirun
then assigns a unique rank (from 0 to N-1, where N is the total number of processes) to each MPI process it launches.
flowchart TD A[SGE Job Submission] --> B{Request PE and Slots} B --> C[SGE Allocates Resources] C --> D["Generates $PE_HOSTFILE"] D --> E[mpirun Reads $PE_HOSTFILE] E --> F[mpirun Launches MPI Processes] F --> G["MPI Processes Get Unique Ranks (0 to N-1)"] G --> H[MPI Application Execution]
Flow of resource allocation from SGE to OpenMPI process numbering.
Crucially, the total number of processes (-np
argument to mpirun
) should match the total number of slots SGE has allocated ($NSLOTS
). If these numbers don't match, you might encounter errors or inefficient resource usage. For instance, if you request 8 slots from SGE but tell mpirun
to launch 16 processes, mpirun
will attempt to launch more processes than SGE has allocated, potentially leading to oversubscription or job failure.
$NSLOTS
environment variable provided by SGE directly in your mpirun
command for the -np
argument. This ensures that the number of MPI processes launched exactly matches the number of slots allocated by SGE, preventing resource mismatches.Mapping SGE Slots to MPI Ranks
The relationship is direct: each SGE slot corresponds to one potential MPI process. OpenMPI takes the list of hosts and slots from the $PE_HOSTFILE
and distributes the MPI ranks across them. By default, OpenMPI tries to fill up each host with processes before moving to the next host, but this behavior can be controlled with process binding and mapping options (e.g., --map-by
, --bind-to
).
# Example $PE_HOSTFILE content for a job requesting 8 slots
# SGE allocates 4 slots on hostA and 4 slots on hostB
hostA.example.com 4
hostB.example.com 4
Example content of an SGE PE_HOSTFILE.
Given the above $PE_HOSTFILE
, mpirun -np 8 ./my_mpi_program
would typically launch ranks 0, 1, 2, 3 on hostA
and ranks 4, 5, 6, 7 on hostB
. The exact assignment of ranks to physical cores within a host depends on OpenMPI's binding policies and the available hardware topology.
--map-by
and --bind-to
options to optimize performance.