Skip to content

Remove C-wrapper for MPI_Barrier #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 31, 2025
Merged

Remove C-wrapper for MPI_Barrier #37

merged 3 commits into from
Mar 31, 2025

Conversation

adit4443ya
Copy link
Collaborator

@adit4443ya adit4443ya commented Mar 24, 2025

Towards #21

@adit4443ya
Copy link
Collaborator Author

adit4443ya commented Mar 29, 2025

@gxyd
Can you please take this diff in from this PR and test locally and let me know if there comes an error?

I will update this PR after a while after i confirm the changes are correct!!
As locally when i rebased/merged with latest main the barrier example wAS not running which i'm not sure yet

I encounter this (same old error)

 aditya-trivedi   tests    barrier ≢    FC='gfortran' ./run_tests.sh 
Compiling allreduce...
Running allreduce with 1 MPI ranks...
 Allreduce test completed with            0  errors.
Test allreduce with 1 MPI ranks PASSED!
Running allreduce with 2 MPI ranks...
 Allreduce test completed with            0  errors.
 Allreduce test completed with            0  errors.
Test allreduce with 2 MPI ranks PASSED!
Running allreduce with 4 MPI ranks...
 Allreduce test completed with            0  errors.
 Allreduce test completed with            0  errors.
 Allreduce test completed with            0  errors.
 Allreduce test completed with            0  errors.
Test allreduce with 4 MPI ranks PASSED!
Compiling barrier_1...
Running barrier_1 with 1 MPI ranks...
 Process            0  reached before the barrier.
[Observer:00000] *** An error occurred in MPI_Barrier
[Observer:00000] *** reported by process [1070530561,0]
[Observer:00000] *** on communicator MPI_COMM_WORLD
[Observer:00000] *** MPI_ERR_COMM: invalid communicator
[Observer:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Observer:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node Observer calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------
Test barrier_1 with 1 MPI ranks FAILED!
 aditya-trivedi   tests    barrier ≢  ?2    echo $?
1

Whereas git log says

 aditya-trivedi   tests    barrier ≢  ?2    git log
commit 03630368478ecf6b949cfcee9769f293ce5b2e17 (HEAD -> barrier)
Author: Aditya Trivedi <adit4443ya@gmail.com>
Date:   Mon Mar 24 21:09:29 2025 +0530

    Remove C-wrapper for MPI_Barrier

commit c46fffb1ca1cd380ab23e25ac6b3227d97fbf9a8 (origin/main, origin/HEAD, gaurav/main, main)
Author: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com>
Date:   Thu Mar 27 15:55:33 2025 +0530

    CI: run standalone tests with MPI rank 4 as well (#43)
    
    * CI: run standalone tests with MPI rank 4 as well
    
    * use `--oversubscribe` option with Open MPI but not with MPICH
    
    * don't use `--oversubscribe` for rank 1 or rank 2 with Open MPI

commit 5ebb90277ffce78e2bedcccc692fdee5f2490d0d
Author: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com>
Date:   Thu Mar 27 15:55:18 2025 +0530

    CI: run POT3D validation with MPI rank 4 as well (#48)
    
    * CI: run POT3D validation with MPI rank 4 as well
    
    * use `--oversubscribe` with Open MPI when running with MPI rank 4

commit a9e8ef96d2b1e23f764d7092a61241a1b5536a1d
Author: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com>
Date:   Thu Mar 27 12:00:37 2025 +0530

    tests: add test program to compute pi using Monte Carlo method (#50)
    
    * add test program to compute pi using Monte Carlo method
    
    * use function to get MPI_Op
    
    * add check to make sure that the computed PI value is within range

WhhereAS changes which are in this pr if run tests on this Branch without merging/rebasing on main it works for me !!!!

@adit4443ya
Copy link
Collaborator Author

Ok i found out why this happened
The reason was MPI_COMM_WORLD value being -1000
In this pr it's set to 0 by default but in the latest:main it's -1000
@gxyd can you explain why we have this change in the value?

@gxyd
Copy link
Collaborator

gxyd commented Mar 29, 2025

In this pr it's set to 0 by default but in the latest:main it's -1000
@gxyd can you explain why we have this change in the value?

I'm ok with any non-zero value, as long as the value here: https://github.com/gxyd/c_mpi/blob/main/src/mpi_wrapper.c#L6 is also set the same.

The reason, I set it to non-zero is because, any unitialized variable in Fortran mostly (not always has a zero value as well) also has a zero value.

So, even if you set MPI_Comm_WORLD as -2022, as long as you set it here: https://github.com/gxyd/c_mpi/blob/main/src/mpi_wrapper.c#L6 the same, I think it should be fine.

@adit4443ya
Copy link
Collaborator Author

I set it to 91 and used your c-wrapper get_c_comm_from_fortran

@adit4443ya adit4443ya requested a review from gxyd March 29, 2025 12:44
@certik
Copy link
Collaborator

certik commented Mar 29, 2025

This PR would work with keeping FORTRAN_MPI_COMM_WORLD unmodified, correct? If so, let's do that, and you can change this value in a separate PR. I don't like lumping unrelated things into the same PR.

@adit4443ya
Copy link
Collaborator Author

adit4443ya commented Mar 29, 2025

This PR would work with keeping FORTRAN_MPI_COMM_WORLD unmodified, correct?

@certik Actually, no. Any negative value of COMM doesn't make sense, as the communicator is the identifier of the group of processes. In complex topologies, specific communicator values are used to perform splits and group the processes together in the mesh.

Reference: MPI_Comm_split Function

In that new communicator, which can be derived upon the split, must be non-negative; hence, if the parent communicator was negative, then it would not work.

Also, I checked it locally in this PR; it doesn't work with negative values.

@gxyd
Copy link
Collaborator

gxyd commented Mar 31, 2025

Also, I checked it locally in this PR; it doesn't work with negative values

I would really like to understand this better, so for this, I'll push a change with restoring the FORTRAN_MPI_COMM_WORLD to it's original state (i.e. -1000) and see where exactly the CI fails with negative value of FORTRAN_MPI_COMM_WORLD.

it would be helpful to see exactly where the CI fails with negative
value of `FORTRAN_MPI_COMM_WORLD`
@gxyd
Copy link
Collaborator

gxyd commented Mar 31, 2025

Any negative value of COMM doesn't make sense, as the communicator is the identifier of the group of processes. In complex topologies, specific communicator values are used to perform splits and group the processes together in the mesh.

The reason it works with negative values or any value as such of FORTRAN_MPI_COMM_WOPRLD is because we just use it as an identifier in mpi_wrapper.c, where the MPI_Comm instance is MPI_COMM_WORLD or some other, so we don't use it's integral value for any other purpose.

The CI also seems to pass.

I searched in vapaa repository, as it seems like set a negative value as well: https://github.com/jeffhammond/vapaa/blob/3a5ba0b2ae8e1e75b3ab63fe3da7923eabfdd366/source/vapaa_constants.h#L14. Along with it's use as: https://github.com/jeffhammond/vapaa/blob/3a5ba0b2ae8e1e75b3ab63fe3da7923eabfdd366/source/convert_handles.h#L69-L71

Copy link
Collaborator

@gxyd gxyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, this is great. Thanks @adit4443ya . I'm merging this PR, we can have a separate isssue/PR for MPI_COMM_WORLD issue if you still think there is some we missed.

@gxyd gxyd merged commit b701519 into lfortran:main Mar 31, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants