CI: run standalone tests with MPI rank 4 as well #43

gxyd · 2025-03-25T13:12:14Z

Description

Towards: #41

certik · 2025-03-25T13:43:16Z

It's showing:

Running allreduce with 4 MPI ranks...
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4
slots that were requested by the application:

  ./allreduce

Either request fewer procs for your application, or make more slots
available for use.

A "slot" is the PRRTE term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which PRRTE processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, PRRTE defaults to the number of processor cores

In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.

I think we need to set some options to run more MPI ranks on fewer processors ("oversubscribe").

gxyd · 2025-03-26T03:35:55Z

This is interesting.

I tried running a test program on my machine like ./run_with_mpi_wrappers.sh test.f90 -np 60 and it works, but anything more than 60, doesn't work on my machine. (macbook air M1 with 8 cores)

Regarding --map-by, I see documentation here: https://docs.open-mpi.org/en/v5.0.x/launching-apps/scheduling.html, which seems a little complicated.

gxyd · 2025-03-26T03:38:43Z

So, I used ulimit -n, and the output earlier was 256, which describes the number of file descriptors that macOS can open , once I increased ulimit -n to 1000, a higher value of -np 100 also works with MPIRUN for me now, so I guess that's a separate problem from the one noticed in this issue.

gxyd · 2025-03-26T04:15:36Z

In the last CI run:

~~Run standalone tests with GFortran with Open MPI (ubuntu-20.04) took 1 minute 11 seconds~~

while
Run standalone tests with GFortran with MPICH (ubuntu-20.04) took just 22 seconds.

I think that might have been because of un-necessary use of --oversubscribe with OpenMPI when running with MPI rank 1 and MPI rank 2, where we don't really need --oversubscribe. Let's see the timings on the new CI run, where I've pushed a fix now.

[EDIT]: I realized now that even on current main (where no --oversubscribe is used at all), the tests for Open MPI run a little slower compared to tests with MPICH, I'm not sure why. @certik , do you think that's something to investigate?

certik · 2025-03-26T05:14:38Z

There can be many different reasons for that. Create an issue for this, so that we can track it, but as long as the speed is reasonable for both, and openmpi does not get slower as you keep merging PRs, and mpich also doesn't get slower than master in a given PR, I would not worry about it.

gxyd · 2025-03-26T05:16:11Z

Got it, I noticed that it's only for standalone tests that this is the case, but not for POT3D.

gxyd · 2025-03-26T05:16:39Z

I'll create a separate PR to run POT3D with MPI rank 4 as well.

validation2 runs much faster than validation dataset, also run with MPI rank 4 as well

CI: run standalone tests with MPI rank 4 as well

970094f

gxyd added 2 commits March 26, 2025 09:23

use --oversubscribe option with Open MPI but not with MPICH

029762c

don't use --oversubscribe for rank 1 or rank 2 with Open MPI

5da468d

gxyd requested a review from certik March 26, 2025 05:16

gxyd merged commit c46fffb into main Mar 27, 2025
16 checks passed

gxyd mentioned this pull request Mar 27, 2025

Run all tests with MPI ranks 1, 2, and 4 #41

Closed

gxyd deleted the run_mpi_rank4_at_CI branch May 7, 2025 08:38

gxyd referenced this pull request in gxyd/POT3D May 28, 2025

use validation2 dataset to run POT3D

9bf5d47

validation2 runs much faster than validation dataset, also run with MPI rank 4 as well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: run standalone tests with MPI rank 4 as well #43

CI: run standalone tests with MPI rank 4 as well #43

Uh oh!

gxyd commented Mar 25, 2025 •

edited

Loading

Uh oh!

certik commented Mar 25, 2025

Uh oh!

gxyd commented Mar 26, 2025 •

edited

Loading

Uh oh!

gxyd commented Mar 26, 2025 •

edited

Loading

Uh oh!

gxyd commented Mar 26, 2025 •

edited

Loading

Uh oh!

certik commented Mar 26, 2025

Uh oh!

gxyd commented Mar 26, 2025

Uh oh!

gxyd commented Mar 26, 2025

Uh oh!

Uh oh!

Uh oh!

CI: run standalone tests with MPI rank 4 as well #43

CI: run standalone tests with MPI rank 4 as well #43

Uh oh!

Conversation

gxyd commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

certik commented Mar 25, 2025

Uh oh!

gxyd commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gxyd commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gxyd commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

certik commented Mar 26, 2025

Uh oh!

gxyd commented Mar 26, 2025

Uh oh!

gxyd commented Mar 26, 2025

Uh oh!

Uh oh!

Uh oh!

gxyd commented Mar 25, 2025 •

edited

Loading

gxyd commented Mar 26, 2025 •

edited

Loading

gxyd commented Mar 26, 2025 •

edited

Loading

gxyd commented Mar 26, 2025 •

edited

Loading