Skip to content

CI: run standalone tests with MPI rank 4 as well #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 27, 2025
Merged

Conversation

gxyd
Copy link
Collaborator

@gxyd gxyd commented Mar 25, 2025

Description

Towards: #41

@certik
Copy link
Collaborator

certik commented Mar 25, 2025

It's showing:

Running allreduce with 4 MPI ranks...
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4
slots that were requested by the application:

  ./allreduce

Either request fewer procs for your application, or make more slots
available for use.

A "slot" is the PRRTE term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which PRRTE processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, PRRTE defaults to the number of processor cores

In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.

I think we need to set some options to run more MPI ranks on fewer processors ("oversubscribe").

@gxyd
Copy link
Collaborator Author

gxyd commented Mar 26, 2025

This is interesting.

I tried running a test program on my machine like ./run_with_mpi_wrappers.sh test.f90 -np 60 and it works, but anything more than 60, doesn't work on my machine. (macbook air M1 with 8 cores)

Regarding --map-by, I see documentation here: https://docs.open-mpi.org/en/v5.0.x/launching-apps/scheduling.html, which seems a little complicated.

@gxyd
Copy link
Collaborator Author

gxyd commented Mar 26, 2025

So, I used ulimit -n, and the output earlier was 256, which describes the number of file descriptors that macOS can open , once I increased ulimit -n to 1000, a higher value of -np 100 also works with MPIRUN for me now, so I guess that's a separate problem from the one noticed in this issue.

@gxyd
Copy link
Collaborator Author

gxyd commented Mar 26, 2025

In the last CI run:

Run standalone tests with GFortran with Open MPI (ubuntu-20.04) took 1 minute 11 seconds

while
Run standalone tests with GFortran with MPICH (ubuntu-20.04) took just 22 seconds.

I think that might have been because of un-necessary use of --oversubscribe with OpenMPI when running with MPI rank 1 and MPI rank 2, where we don't really need --oversubscribe. Let's see the timings on the new CI run, where I've pushed a fix now.

[EDIT]: I realized now that even on current main (where no --oversubscribe is used at all), the tests for Open MPI run a little slower compared to tests with MPICH, I'm not sure why. @certik , do you think that's something to investigate?

@certik
Copy link
Collaborator

certik commented Mar 26, 2025

There can be many different reasons for that. Create an issue for this, so that we can track it, but as long as the speed is reasonable for both, and openmpi does not get slower as you keep merging PRs, and mpich also doesn't get slower than master in a given PR, I would not worry about it.

@gxyd
Copy link
Collaborator Author

gxyd commented Mar 26, 2025

Got it, I noticed that it's only for standalone tests that this is the case, but not for POT3D.

@gxyd gxyd requested a review from certik March 26, 2025 05:16
@gxyd
Copy link
Collaborator Author

gxyd commented Mar 26, 2025

I'll create a separate PR to run POT3D with MPI rank 4 as well.

@gxyd gxyd merged commit c46fffb into main Mar 27, 2025
16 checks passed
@gxyd gxyd deleted the run_mpi_rank4_at_CI branch May 7, 2025 08:38
gxyd referenced this pull request in gxyd/POT3D May 28, 2025
validation2 runs much faster than validation dataset, also
run with MPI rank 4 as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants