-
Notifications
You must be signed in to change notification settings - Fork 2
Remove C-wrapper for MPI_Barrier #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@gxyd I will update this PR after a while after i confirm the changes are correct!! I encounter this (same old error) aditya-trivedi tests barrier ≢ FC='gfortran' ./run_tests.sh
Compiling allreduce...
Running allreduce with 1 MPI ranks...
Allreduce test completed with 0 errors.
Test allreduce with 1 MPI ranks PASSED!
Running allreduce with 2 MPI ranks...
Allreduce test completed with 0 errors.
Allreduce test completed with 0 errors.
Test allreduce with 2 MPI ranks PASSED!
Running allreduce with 4 MPI ranks...
Allreduce test completed with 0 errors.
Allreduce test completed with 0 errors.
Allreduce test completed with 0 errors.
Allreduce test completed with 0 errors.
Test allreduce with 4 MPI ranks PASSED!
Compiling barrier_1...
Running barrier_1 with 1 MPI ranks...
Process 0 reached before the barrier.
[Observer:00000] *** An error occurred in MPI_Barrier
[Observer:00000] *** reported by process [1070530561,0]
[Observer:00000] *** on communicator MPI_COMM_WORLD
[Observer:00000] *** MPI_ERR_COMM: invalid communicator
[Observer:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Observer:00000] *** and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node Observer calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------
Test barrier_1 with 1 MPI ranks FAILED!
aditya-trivedi tests barrier ≢ ?2 echo $?
1 Whereas git log says aditya-trivedi tests barrier ≢ ?2 git log
commit 03630368478ecf6b949cfcee9769f293ce5b2e17 (HEAD -> barrier)
Author: Aditya Trivedi <adit4443ya@gmail.com>
Date: Mon Mar 24 21:09:29 2025 +0530
Remove C-wrapper for MPI_Barrier
commit c46fffb1ca1cd380ab23e25ac6b3227d97fbf9a8 (origin/main, origin/HEAD, gaurav/main, main)
Author: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com>
Date: Thu Mar 27 15:55:33 2025 +0530
CI: run standalone tests with MPI rank 4 as well (#43)
* CI: run standalone tests with MPI rank 4 as well
* use `--oversubscribe` option with Open MPI but not with MPICH
* don't use `--oversubscribe` for rank 1 or rank 2 with Open MPI
commit 5ebb90277ffce78e2bedcccc692fdee5f2490d0d
Author: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com>
Date: Thu Mar 27 15:55:18 2025 +0530
CI: run POT3D validation with MPI rank 4 as well (#48)
* CI: run POT3D validation with MPI rank 4 as well
* use `--oversubscribe` with Open MPI when running with MPI rank 4
commit a9e8ef96d2b1e23f764d7092a61241a1b5536a1d
Author: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com>
Date: Thu Mar 27 12:00:37 2025 +0530
tests: add test program to compute pi using Monte Carlo method (#50)
* add test program to compute pi using Monte Carlo method
* use function to get MPI_Op
* add check to make sure that the computed PI value is within range WhhereAS changes which are in this pr if run tests on this Branch without merging/rebasing on main it works for me !!!! |
Ok i found out why this happened |
I'm ok with any non-zero value, as long as the value here: https://github.com/gxyd/c_mpi/blob/main/src/mpi_wrapper.c#L6 is also set the same. The reason, I set it to non-zero is because, any unitialized variable in Fortran mostly (not always has a zero value as well) also has a zero value. So, even if you set |
…fortran for f2c conversion
I set it to 91 and used your c-wrapper get_c_comm_from_fortran |
This PR would work with keeping FORTRAN_MPI_COMM_WORLD unmodified, correct? If so, let's do that, and you can change this value in a separate PR. I don't like lumping unrelated things into the same PR. |
@certik Actually, no. Any negative value of Reference: MPI_Comm_split Function In that new communicator, which can be derived upon the split, must be non-negative; hence, if the parent communicator was negative, then it would not work. Also, I checked it locally in this PR; it doesn't work with negative values. |
I would really like to understand this better, so for this, I'll push a change with restoring the |
it would be helpful to see exactly where the CI fails with negative value of `FORTRAN_MPI_COMM_WORLD`
The reason it works with negative values or any value as such of The CI also seems to pass. I searched in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, this is great. Thanks @adit4443ya . I'm merging this PR, we can have a separate isssue/PR for MPI_COMM_WORLD
issue if you still think there is some we missed.
Towards #21