-
Notifications
You must be signed in to change notification settings - Fork 99
Add new SpGEMM/SpGEAM interface with reuse capabilities #1934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lgtm, only some smaller remarks.
- add documentation to multiply_reuse_info - more strict input dimension checks - fix test comments Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
@MarcelKoch I added the functionality to a benchmark, a very brief check on a 1Mx1M stencil matrix on an A2 GPU shows a roughly 1.4x speedup over normal SpGEMM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main question about how to construct the info object.
I added advanced_spgemm (multiply_add) and spgeam (add_scale) interfaces, as well as reusable versions thereof with extensive tests of all the size checks and cross-executor functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only smaller nits left, mostly regarding the testing.
|
||
multiply_reuse_info(const multiply_reuse_info&) = delete; | ||
|
||
multiply_reuse_info(multiply_reuse_info&&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and all other move constructors/assignments) can be made noexcept
* add_reuse to recompute a sparse matrix-matrix sum | ||
* with updated values. | ||
*/ | ||
class add_scale_reuse_info { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the other reuse_info structs are defined before the corresponding non-reuse function.
|
||
|
||
template <typename ValueType, typename IndexType> | ||
Csr<ValueType, IndexType>::multiply_reuse_info::multiply_reuse_info() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=default ?
|
||
|
||
template <typename ValueType, typename IndexType> | ||
Csr<ValueType, IndexType>::multiply_add_reuse_info::multiply_add_reuse_info() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=default ?
size_type nnz1; | ||
size_type nnz2; | ||
size_type nnz_out; | ||
// no other data yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// no other data yet |
} | ||
|
||
|
||
TEST_F(Csr, MultiplyAddReuseCrossExecutor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AAA pattern is broken
#endif | ||
|
||
|
||
TEST_F(Csr, MultiplyReuseCrossExecutor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again
set_up_apply_data<Mtx::classical>(); | ||
auto trans = gko::as<Mtx>(mtx->transpose()); | ||
|
||
auto [dresult, reuse] = dmtx->multiply_reuse(trans); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
auto [dresult, reuse] = dmtx->multiply_reuse(trans); | |
auto [dresult, dreuse] = dmtx->multiply_reuse(trans); |
I think this can help to make it clearer in the act section that the device reuse is applied to the host matrix.
mtx2 = gen_mtx<Mtx>(mtx_size[0], mtx_size[1], 0); | ||
dmtx = gko::clone(exec, mtx); | ||
|
||
auto [dresult, reuse] = dmtx->add_scale_reuse(alpha, beta, mtx2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
} | ||
|
||
|
||
TEST_F(Csr, AddScaleReuseCrossExecutor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again AAA
This adds a new SpGEMM/SpGEAM interface that works without abusing the
LinOp::apply
interface, and a new SpGEMM kernel that operates on an existing output structure (maybe it also makes sense to expose this as a masked SpGEMM operation? The difference is similar to LU vs. ILU kernels)