-
Notifications
You must be signed in to change notification settings - Fork 99
Implementation of MSpM on CPU #1911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Error: The following files need to be formatted:
You can find a formatting patch under Artifacts here or run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the code format fitting into Ginkgo requirement, please install pre-commit
and run pre-commit install
to install the ginkgo config.
after installing that, it should format your code when you try to commit something.
To apply it to the codes already in branch, you can use re-commit run --from-ref origin/develop --to-ref HEAD
const matrix::Csr<ValueType, IndexType>* b, | ||
const matrix::Dense<ValueType>* beta, matrix::Dense<ValueType>* c) | ||
{ | ||
// TODO: implement c = alpha * a * b + beta * c with single thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: implement c = alpha * a * b + beta * c with single thread |
you can delete this, too
const matrix::Csr<ValueType, IndexType>* b, | ||
matrix::Dense<ValueType>* c) | ||
{ | ||
// TODO: implement c = a * b with single thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: implement c = a * b with single thread |
sorry for my wrong copy. you can delete this comment here
const matrix::Csr<ValueType, IndexType>* b, | ||
matrix::Dense<ValueType>* c) | ||
{ | ||
// TODO: implement c = a * b with single thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: implement c = a * b with single thread |
const matrix::Csr<ValueType, IndexType>* b, | ||
const matrix::Dense<ValueType>* beta, matrix::Dense<ValueType>* c) | ||
{ | ||
// TODO: implement c = alpha * a * b + beta * c with single thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: implement c = alpha * a * b + beta * c with single thread |
const auto a_vals = acc::helper::build_const_rrm_accessor<ValueType>(a); | ||
const auto b_vals = acc::helper::build_const_rrm_accessor<ValueType>(b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unnecessary because the kernel is working for uniform precision now.
Does it affect your performance?
for(IndexType k=zero<IndexType>(); k<b->get_size()[0]; k++){ | ||
const auto val_A = define_multiplication_operand(row, k); | ||
//iterate over the non-zero values of a row | ||
for(IndexType idx_B=b_rowptrs[k]; idx_B<b_rowptrs[k+1]; idx_B++){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for(IndexType idx_B=b_rowptrs[k]; idx_B<b_rowptrs[k+1]; idx_B++){ | |
for(auto idx_b=b_rowptrs[k]; idx_b<b_rowptrs[k+1]; idx_b++){ |
also for the name
initialize_accumulator(th_acc_begin_ptr, sub_acc_size, row); | ||
//iterate over the whole matrix b | ||
for(IndexType k=zero<IndexType>(); k<b->get_size()[0]; k++){ | ||
const auto val_A = define_multiplication_operand(row, k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our variable name should be snake_case
ref: https://github.com/ginkgo-project/ginkgo/wiki/Contributing-guidelines#variables
auto out_ptr = c_vals_ptr + row*c->get_stride(); | ||
std::copy(th_acc_begin_ptr, th_acc_end_ptr, out_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Maybe I missing that. Could you remind me why you need the accumulator array?
From the implementation, one thread handle a row, so you can directly operate on the output data without accumulator, right?
std::transform( //initialize the accumulator with c + beta | ||
begin_row_c_vals_ptr, begin_row_c_vals_ptr + acc_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::transform( //initialize the accumulator with c + beta | |
begin_row_c_vals_ptr, begin_row_c_vals_ptr + acc_size, | |
//initialize the accumulator with c + beta | |
std::transform( | |
begin_row_c_vals_ptr, begin_row_c_vals_ptr + acc_size, |
I would move comment out of function call to let clang-format do format unless there is a good reason.
auto advanced_def_mult_operand = [a, alpha](IndexType row, IndexType k){ | ||
return alpha->at(0, 0) * a->at(row, k); //multiply a(row,k) by alpha | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto advanced_def_mult_operand = [a, alpha](IndexType row, IndexType k){ | |
return alpha->at(0, 0) * a->at(row, k); //multiply a(row,k) by alpha | |
}; | |
auto advanced_def_mult_operand = [alpha](auto a_val, auto b_val){ | |
return alpha->at(0, 0) * a_val * b_val; //multiply a(row,k) by alpha | |
}; |
from the name, I was expecting this form. also this will reduce the unclear data access from the function call, but I do not have strong opinion on this yet.
Implementation of simple_mspm (AB = C) and mspm (aAB+bC = C) in the reference and omp executor and adding corresponding tests.