Skip to content

Conversation

upsj
Copy link
Member

@upsj upsj commented Sep 30, 2025

This adds a new SpGEMM/SpGEAM interface that works without abusing the LinOp::apply interface, and a new SpGEMM kernel that operates on an existing output structure (maybe it also makes sense to expose this as a masked SpGEMM operation? The difference is similar to LU vs. ILU kernels)

@upsj upsj requested a review from a team September 30, 2025 14:48
@upsj upsj self-assigned this Sep 30, 2025
@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Sep 30, 2025
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats mod:all This touches all Ginkgo modules. labels Sep 30, 2025
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm, only some smaller remarks.

upsj and others added 4 commits October 1, 2025 15:35
- add documentation to multiply_reuse_info
- more strict input dimension checks
- fix test comments

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
@upsj upsj requested a review from MarcelKoch October 1, 2025 14:58
@upsj
Copy link
Member Author

upsj commented Oct 1, 2025

@MarcelKoch I added the functionality to a benchmark, a very brief check on a 1Mx1M stencil matrix on an A2 GPU shows a roughly 1.4x speedup over normal SpGEMM

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main question about how to construct the info object.

@upsj upsj changed the title Add new SpGEMM interface with reuse capabilities Add new SpGEMM/SpGEAM interface with reuse capabilities Oct 7, 2025
@upsj upsj requested a review from MarcelKoch October 7, 2025 18:38
@upsj
Copy link
Member Author

upsj commented Oct 7, 2025

I added advanced_spgemm (multiply_add) and spgeam (add_scale) interfaces, as well as reusable versions thereof with extensive tests of all the size checks and cross-executor functionality.

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only smaller nits left, mostly regarding the testing.


multiply_reuse_info(const multiply_reuse_info&) = delete;

multiply_reuse_info(multiply_reuse_info&&);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and all other move constructors/assignments) can be made noexcept

* add_reuse to recompute a sparse matrix-matrix sum
* with updated values.
*/
class add_scale_reuse_info {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the other reuse_info structs are defined before the corresponding non-reuse function.



template <typename ValueType, typename IndexType>
Csr<ValueType, IndexType>::multiply_reuse_info::multiply_reuse_info()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=default ?



template <typename ValueType, typename IndexType>
Csr<ValueType, IndexType>::multiply_add_reuse_info::multiply_add_reuse_info()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=default ?

size_type nnz1;
size_type nnz2;
size_type nnz_out;
// no other data yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// no other data yet

}


TEST_F(Csr, MultiplyAddReuseCrossExecutor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AAA pattern is broken

#endif


TEST_F(Csr, MultiplyReuseCrossExecutor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

set_up_apply_data<Mtx::classical>();
auto trans = gko::as<Mtx>(mtx->transpose());

auto [dresult, reuse] = dmtx->multiply_reuse(trans);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
auto [dresult, reuse] = dmtx->multiply_reuse(trans);
auto [dresult, dreuse] = dmtx->multiply_reuse(trans);

I think this can help to make it clearer in the act section that the device reuse is applied to the host matrix.

mtx2 = gen_mtx<Mtx>(mtx_size[0], mtx_size[1], 0);
dmtx = gko::clone(exec, mtx);

auto [dresult, reuse] = dmtx->add_scale_reuse(alpha, beta, mtx2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

}


TEST_F(Csr, AddScaleReuseCrossExecutor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again AAA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants