Skip to content

PR 12101 broke support for PRTE_MCA_ras_base_launch_orted_on_hn #12150

@hppritcha

Description

@hppritcha

The sha advance for the 3rd-party/prrte done in PR #12101 pulled in a bug that broke support for PRTE_MCA_ras_base_launch_orted_on_hn when set to 1.

This parameter is important when

  • using a system managed with SLURM or Cray PALS (native launcher)
  • the vendor supported method for acquiring RDMA credentials to use "high speed" networks relies on mechanisms within the SLURM/PALS launch procedure to provide the application processes with this info (the case for HPE SS11).

The bug is supposed to be fixed in PRRTe master but not yet in release branches used by Open MPI.

This problem is in both main and 5.0.x at the moment.

I'm marking this as critical because for sites using HPE SS11 and not supporting PMIx in SLURM or PALS, there's no alternative to using prrte based launch so currently there's a failure to launch (at least easily) on these platforms. I believe is the case for ORNL systems.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions