Implementation of MSpM on CPU #1911

CoxyMielle · 2025-08-13T13:45:09Z

Implementation of simple_mspm (AB = C) and mspm (aAB+bC = C) in the reference and omp executor and adding corresponding tests.

…d omp executors

ginkgo-bot · 2025-08-13T13:45:38Z

Error: The following files need to be formatted:

omp/matrix/dense_kernels.cpp
reference/matrix/dense_kernels.cpp
reference/test/matrix/dense_kernels.cpp
test/matrix/dense_kernels.cpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

yhmtsai

To make the code format fitting into Ginkgo requirement, please install pre-commit and run pre-commit install to install the ginkgo config.
after installing that, it should format your code when you try to commit something.
To apply it to the codes already in branch, you can use re-commit run --from-ref origin/develop --to-ref HEAD

yhmtsai · 2025-08-13T13:50:03Z

common/cuda_hip/matrix/dense_kernels.cpp

+          const matrix::Csr<ValueType, IndexType>* b,
+          const matrix::Dense<ValueType>* beta, matrix::Dense<ValueType>* c)
+{
+    // TODO: implement c = alpha * a * b + beta * c with single thread


Suggested change

// TODO: implement c = alpha * a * b + beta * c with single thread

you can delete this, too

yhmtsai · 2025-08-13T13:50:22Z

common/cuda_hip/matrix/dense_kernels.cpp

+                 const matrix::Csr<ValueType, IndexType>* b,
+                 matrix::Dense<ValueType>* c)
+{
+    // TODO: implement c = a * b with single thread


Suggested change

// TODO: implement c = a * b with single thread

sorry for my wrong copy. you can delete this comment here

yhmtsai · 2025-08-13T13:50:48Z

dpcpp/matrix/dense_kernels.dp.cpp

+                 const matrix::Csr<ValueType, IndexType>* b,
+                 matrix::Dense<ValueType>* c)
+{
+    // TODO: implement c = a * b with single thread


Suggested change

// TODO: implement c = a * b with single thread

yhmtsai · 2025-08-13T13:50:55Z

dpcpp/matrix/dense_kernels.dp.cpp

+          const matrix::Csr<ValueType, IndexType>* b,
+          const matrix::Dense<ValueType>* beta, matrix::Dense<ValueType>* c)
+{
+    // TODO: implement c = alpha * a * b + beta * c with single thread


Suggested change

// TODO: implement c = alpha * a * b + beta * c with single thread

yhmtsai · 2025-08-13T13:52:18Z

omp/matrix/dense_kernels.cpp

+    const auto a_vals = acc::helper::build_const_rrm_accessor<ValueType>(a);
+    const auto b_vals = acc::helper::build_const_rrm_accessor<ValueType>(b);


this is unnecessary because the kernel is working for uniform precision now.
Does it affect your performance?

yhmtsai · 2025-08-13T13:57:05Z

omp/matrix/dense_kernels.cpp

+            for(IndexType k=zero<IndexType>(); k<b->get_size()[0]; k++){
+                const auto val_A = define_multiplication_operand(row, k);
+                //iterate over the non-zero values of a row
+                for(IndexType idx_B=b_rowptrs[k]; idx_B<b_rowptrs[k+1]; idx_B++){


Suggested change

for(IndexType idx_B=b_rowptrs[k]; idx_B<b_rowptrs[k+1]; idx_B++){

for(auto idx_b=b_rowptrs[k]; idx_b<b_rowptrs[k+1]; idx_b++){

also for the name

yhmtsai · 2025-08-13T13:57:58Z

omp/matrix/dense_kernels.cpp

+            initialize_accumulator(th_acc_begin_ptr, sub_acc_size, row);
+            //iterate over the whole matrix b
+            for(IndexType k=zero<IndexType>(); k<b->get_size()[0]; k++){
+                const auto val_A = define_multiplication_operand(row, k);


our variable name should be snake_case ref: https://github.com/ginkgo-project/ginkgo/wiki/Contributing-guidelines#variables

yhmtsai · 2025-08-13T14:01:15Z

omp/matrix/dense_kernels.cpp

+            auto out_ptr = c_vals_ptr + row*c->get_stride();
+            std::copy(th_acc_begin_ptr, th_acc_end_ptr, out_ptr);


Question: Maybe I missing that. Could you remind me why you need the accumulator array?
From the implementation, one thread handle a row, so you can directly operate on the output data without accumulator, right?

yhmtsai · 2025-08-13T14:05:38Z

omp/matrix/dense_kernels.cpp

+        std::transform( //initialize the accumulator with c + beta
+            begin_row_c_vals_ptr, begin_row_c_vals_ptr + acc_size,


Suggested change

std::transform( //initialize the accumulator with c + beta

begin_row_c_vals_ptr, begin_row_c_vals_ptr + acc_size,

//initialize the accumulator with c + beta

std::transform(

begin_row_c_vals_ptr, begin_row_c_vals_ptr + acc_size,

I would move comment out of function call to let clang-format do format unless there is a good reason.

yhmtsai · 2025-08-13T14:10:45Z

reference/matrix/dense_kernels.cpp

+    auto advanced_def_mult_operand = [a, alpha](IndexType row, IndexType k){
+        return alpha->at(0, 0) * a->at(row, k); //multiply a(row,k) by alpha
+    };


Suggested change

auto advanced_def_mult_operand = [a, alpha](IndexType row, IndexType k){

return alpha->at(0, 0) * a->at(row, k); //multiply a(row,k) by alpha

};

auto advanced_def_mult_operand = [alpha](auto a_val, auto b_val){

return alpha->at(0, 0) * a_val * b_val; //multiply a(row,k) by alpha

};

from the name, I was expecting this form. also this will reduce the unclear data access from the function call, but I do not have strong opinion on this yet.

yhmtsai and others added 2 commits August 8, 2025 13:37

initialize MSpM kernel

7f6c166

implementation and tests for simple and advanced MSpM on reference an…

4000577

…d omp executors

ginkgo-bot added reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats mod:all This touches all Ginkgo modules. labels Aug 13, 2025

yhmtsai requested review from upsj and yhmtsai August 13, 2025 13:48

yhmtsai assigned CoxyMielle Aug 13, 2025

yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Aug 13, 2025

yhmtsai requested changes Aug 13, 2025

View reviewed changes

		const auto a_vals = acc::helper::build_const_rrm_accessor<ValueType>(a);
		const auto b_vals = acc::helper::build_const_rrm_accessor<ValueType>(b);

	for(IndexType idx_B=b_rowptrs[k]; idx_B<b_rowptrs[k+1]; idx_B++){
	for(auto idx_b=b_rowptrs[k]; idx_b<b_rowptrs[k+1]; idx_b++){

		auto out_ptr = c_vals_ptr + row*c->get_stride();
		std::copy(th_acc_begin_ptr, th_acc_end_ptr, out_ptr);

		std::transform( //initialize the accumulator with c + beta
		begin_row_c_vals_ptr, begin_row_c_vals_ptr + acc_size,

Implementation of MSpM on CPU #1911

Are you sure you want to change the base?

Implementation of MSpM on CPU #1911

Uh oh!

Conversation

CoxyMielle commented Aug 13, 2025

Uh oh!

ginkgo-bot commented Aug 13, 2025

Uh oh!

yhmtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants