Skip to content

Conversation

MarcelKoch
Copy link
Member

@MarcelKoch MarcelKoch commented Apr 4, 2024

This PR adds a distributed row gatherer. This operator essentially provides the communication required in our matrix apply.

Besides the normal apply (which is blocking), it also provides two asynchronous calls. One version has an additional workspace parameter which is used as send buffer. This version can be called multiple times without restrictions, if different workspaces are used for each call. The other version doesn't have a workspace parameter, and instead uses an internal buffer. As a consequence, this function can only be called a second time, if the request of the previous call has been waited on. Otherwise, this function will throw.

This is the second part of splitting up #1546.

It also introduces some intermediate changes, which could be extracted out beforehand:

PR Stack:

@MarcelKoch MarcelKoch self-assigned this Apr 4, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. type:matrix-format This is related to the Matrix formats labels Apr 4, 2024
@MarcelKoch MarcelKoch requested a review from pratikvn April 4, 2024 10:49
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch from 6b4521b to ae60198 Compare April 4, 2024 11:00
@MarcelKoch MarcelKoch force-pushed the neighborhood-communicator branch from 6acf7c4 to 8aa6ab9 Compare April 4, 2024 11:00
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch 2 times, most recently from 49557f1 to 4a79442 Compare April 5, 2024 08:18
@MarcelKoch MarcelKoch modified the milestone: Ginkgo 1.8.0 Apr 5, 2024
@MarcelKoch MarcelKoch force-pushed the neighborhood-communicator branch from 8aa6ab9 to 77398bd Compare April 17, 2024 16:28
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch from 4a79442 to 172eb7d Compare April 17, 2024 16:28
@MarcelKoch MarcelKoch requested a review from upsj April 19, 2024 09:20
@MarcelKoch MarcelKoch mentioned this pull request Apr 19, 2024
7 tasks
@MarcelKoch MarcelKoch force-pushed the neighborhood-communicator branch from 77398bd to d278cad Compare April 19, 2024 14:39
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch 2 times, most recently from 98fa10a to 79de4c3 Compare April 19, 2024 16:19
@MarcelKoch
Copy link
Member Author

One issue that I have is the constructor. It takes a collective_communicator and an index_map. The index_map already defines the communication pattern, so the collective_communicator has to match that.
One option might be to have a virtual function like

std::unique_ptr<collective_communicator> create_with_same_type(communicator, index_map);

If I can't come up with anything better, I guess I will use that.

@pratikvn
Copy link
Member

Do we need to have the std::future setup for the release ? Can we remove that for now and just use a normal synchronous approach ? I think that is a significant change that maybe needs more thought and probably a separate PR.

Copy link

codecov bot commented May 6, 2025

Codecov Report

Attention: Patch coverage is 93.83754% with 22 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@2deab4b). Learn more about missing BASE report.
Report is 26 commits behind head on develop.

Files with missing lines Patch % Lines
core/distributed/row_gatherer.cpp 92.04% 7 Missing ⚠️
core/distributed/matrix.cpp 92.59% 4 Missing ⚠️
test/mpi/distributed/row_gatherer.cpp 96.58% 4 Missing ⚠️
core/distributed/neighborhood_communicator.cpp 0.00% 3 Missing ⚠️
include/ginkgo/core/base/segmented_array.hpp 25.00% 3 Missing ⚠️
test/mpi/distributed/matrix.cpp 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #1589   +/-   ##
==========================================
  Coverage           ?   88.64%           
==========================================
  Files              ?      849           
  Lines              ?    71101           
  Branches           ?        0           
==========================================
  Hits               ?    63029           
  Misses             ?     8072           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MarcelKoch and others added 25 commits May 12, 2025 10:07
Signed-off-by: Marcel Koch <marcel.koch@kit.edu>
Signed-off-by: Marcel Koch <marcel.koch@kit.edu>
- only allocate if necessary
- synchronize correct executor

Co-authored-by: Pratik Nayak <pratik.nayak@kit.edu>
- split tests into core and backend part
- fix formatting
- fix openmpi pre 4.1.x macro

Co-authored-by: Pratik Nayak <pratik.nayak4@gmail.com>
Co-authored-by: Yu-Hsiang M. Tsai <yhmtsai@gmail.com>
Signed-off-by: Marcel Koch <marcel.koch@kit.edu>
- add copy/move tests
- undo using MPI_Init_thread
- add extra host_recv_buffer_
- create row-gatherer as unique_ptr

Co-authored-by: Yu-Hsiang M. Tsai <yhmtsai@gmail.com>
The `LinOp::apply` function creates temporary clones to match the operators executor, but this will lead to wrong behavior, if MPI doesn't support GPU buffers.
right now the RG doesn't support (blocking) apply, so it doesn't make much sense to keep it as a LinOp
- documentation
- format
- unused code

Co-authored-by: Yu-Hsiang M. Tsai <yhmtsai@gmail.com>
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch from dafb3e6 to e7178d5 Compare May 12, 2025 10:36
Copy link

sonarqubecloud bot commented May 13, 2025

@MarcelKoch MarcelKoch merged commit 20feb0f into develop May 13, 2025
17 of 18 checks passed
@MarcelKoch MarcelKoch deleted the distributed-row-gatherer branch May 13, 2025 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. reg:build This is related to the build system. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants