MPI Slurm interaction #1077
Replies: 13 comments 1 reply
-
Hello David, There are two main ways to launch MPI applications with Charliecloud that we refer to as either a "host" or "guest" launch. A host launch is where the parallel launcher is used to launch multiple containers (that usually join together into a shared namespace). A guest launch is where the parallel launcher used within the container to launch the application. They are of the following forms:
A limitation of the guest launch approach is that it is usually single node only because parallel launcher within the container doesn't know how to launch containers on other nodes. As for your error, I believe what you are running into is the fact that the MPI install within the container is seeing Slurm variables in your environment that are making it think that it can use Slurm mechanisms to launch processes. To workaround this we suggest folks add That being said, you mentioned you want to run across several nodes, so I would recommend a host launch. Using the example you provided this would look something like: Let me know if this does/doesn't help 😃 |
Beta Was this translation helpful? Give feedback.
-
Hi Heasterday, I am unable to execute srun correctly, due to the host system Slurm/Munge configuration. The application sort of completes but a lot of Slurm, Munge and MPI error messages are generated, so not confident the results are valid. I normally execute mpiexec -n 2 -w ./test_mpi_image -- /executable but the problem is that if I take e.g. 2 ranks, my application is run twice with a single rank. I.e. I get twice the outputfolders/data (and not one twice-as-fast application). I need MPI to spread my application on different nodes. Not just for compute parallelism, but also due to the RAM requirements. Any suggestions on how I distribute my containerized application across multiple nodes. David |
Beta Was this translation helpful? Give feedback.
-
Do you believe the Slurm/Munge/MPI errors are an incompatibility between the container MPI and the host MPI, or are they present for non-containerized applications as well? If it's the former I may be able to give you some things to try. Using mpirun/mpiexec to launch a container on every node is more complex because you won't have the PMI compatibility layer and so the host MPI install will need to be as close to the container install as possible. What outcome do you get if you launch with the following command line: Could you point me to a Dockerfile for how you built MPI for your image? |
Beta Was this translation helpful? Give feedback.
-
The original Dockerfile that my colleague created, which I modified to include libpmi: `FROM ubuntu:latest AS buildstage ##USER runner I was able to execute ch-run .... mpiexec .... successfully without Slurm, but when I tried it on a system with Slurm using salloc I got lots of Slurm errors. David David |
Beta Was this translation helpful? Give feedback.
-
Thank you for the example Dockerfile, I will build a simple mpi app with it and see what it takes to run on our systems. Something to note on our example dockerfile, it does assume that our CentOS 8 Dockerfile is being used as its base. The big things we do in that image are install general dependencies and add to the search path for the linker. I will let you know what I find from testing with your Dockerfile. |
Beta Was this translation helpful? Give feedback.
-
Using the provided Dockerfile I was able to build Intel's IMB benchmark and run it across two nodes on our Slurm cluster. Some things to note:
David, please look at the errors I provided and their workarounds and let me know if they are relevant to your environment. NOTE: I didn't evaluate performance at all because this is likely very site dependent. |
Beta Was this translation helpful? Give feedback.
-
I couldn't resolve the Slurm isssues with OpenMPI, so tried using MPICH (Intel MPI) and I no longer gette slurm errors. However, I am getting errors related to the bootstrap proxies: I am using the system version of MPI via binding and I get the same problem even if I execute mpiexec -n 2 ch-run -w image_mpich -- /executable
Do I need to install the infiniband drivers inside the container? David |
Beta Was this translation helpful? Give feedback.
-
Could I get a copy of the Slurm errors you were getting so I can look into them? Also, were these errors generated using our example OpenMPI base or the Dockerfile you provided? I would be very interested in the errors from both for comparison. Typically we recommend, where possible, building the MPI install in the container with the desired communication library (UCX, Libfabric, etc..) and all its dependencies. My guess is that something required for the libraries you are binding in is missing. A shot in the dark, would it be possible for me to get a guest account on some platform with a similar configuration? The thought is that I could then test if something needs to be done differently at build/runtime for an image in your environment vs ours. |
Beta Was this translation helpful? Give feedback.
-
The OpenMPI version gave errors regarding unable to find libpmi.so.1 which is located on the host in /usr/lib64 normally I would bind the directory into the container. I could create a new directory on the host and populate it with symlinks, but don't want to go down that path if possible. Unfortunately, I can't provide access to the system. The system prefered MPI version is intel and OpenMPI isn't well supported, so want to focus on MPICH (Intel MPI versions 2019.7.217 and 2019.8.254). Also, ucx isn't installed on the system, uses the environment settings I_MPI_* Default setting from the module system: I've tried setting FI_PROVIDER=tcp and I_MPI_FABRICS=tcp [mpiexec@i22r07c05s06] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on i22r07c05s06 (pid 20924, exit code 65280) I will try it on another system and get back to you. David |
Beta Was this translation helpful? Give feedback.
-
For clarification, if I execute As I am able to successfully execute Is this correct? |
Beta Was this translation helpful? Give feedback.
-
You could bind in libpmi from the host but I would recommend the container image having that already. On that note, the recommended way to inject a host install of libraries is Re: |
Beta Was this translation helpful? Give feedback.
-
We're trying out the new “Discussions” feature, so I am going to move this thread to that section. Please LMK if anything goes wrong. |
Beta Was this translation helpful? Give feedback.
-
when im trying to srun in cluster i am getting issue can any one help to rectify the issue |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I've been trying to execute an MPI program with OpenMPI across several nodes on a HPC running Slurm (salloc) from within a container using the following command:
ch-run -w ./test_mpi_image -- mpiexec -n 2 /ALPACA
(This command executes successfully when not using Slurm)
And get the following error:
The SLURM process starter for OpenMPI was unable to locate a
usable "srun" command in its path. Please check your path
and try again.
An internal error has occurred in ORTE:
[[56714,0],0] FORCE-TERMINATE AT (null):1 - error plm_slurm_module.c(471)
This is something that should be reported to the developers.
Do you have any documentation on how to configure Slurm to avoid this error?
Would binding the system Slurm executables and libraries in the the container using the -b command resolve this issue?
As I am running on a production system I am limited on what experimentation I can do.
Beta Was this translation helpful? Give feedback.
All reactions