-
Notifications
You must be signed in to change notification settings - Fork 908
Closed
Description
The sha advance for the 3rd-party/prrte done in PR #12101 pulled in a bug that broke support for PRTE_MCA_ras_base_launch_orted_on_hn when set to 1.
This parameter is important when
- using a system managed with SLURM or Cray PALS (native launcher)
- the vendor supported method for acquiring RDMA credentials to use "high speed" networks relies on mechanisms within the SLURM/PALS launch procedure to provide the application processes with this info (the case for HPE SS11).
The bug is supposed to be fixed in PRRTe master but not yet in release branches used by Open MPI.
This problem is in both main and 5.0.x at the moment.
I'm marking this as critical because for sites using HPE SS11 and not supporting PMIx in SLURM or PALS, there's no alternative to using prrte based launch so currently there's a failure to launch (at least easily) on these platforms. I believe is the case for ORNL systems.