-
Notifications
You must be signed in to change notification settings - Fork 99
Add PIPECG reference implementation #1824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've put the "Ready for review" label, but I'm not sure if I should've put the "Requires feedback" one instead, since this is my first PR here and it's hard for me to assess the state of it. any feedback would be considered of course :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Mostly looks good to me. Only the documentation needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for making the changes. One last thing before we merge it is that you add yourself to the contributors list here after Marcel Koch: https://github.com/ginkgo-project/ginkgo/blob/develop/contributors.txt#L20 with the following info. LastName FirstName <email> Affiliation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for your contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add small section into test/test_install/test_install.cpp
core/test/solver/pipe_cg.cpp
Outdated
| #include "core/test/utils.hpp" | ||
|
|
||
|
|
||
| namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| namespace { |
we do not need the anonymous namespace for test now
core/test/solver/pipe_cg.cpp
Outdated
| } | ||
|
|
||
|
|
||
| } // namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| this->small_n.get(), this->small_prev_rho.get(), this->small_rho.get(), | ||
| this->small_delta.get(), &this->small_stop); | ||
|
|
||
| GKO_ASSERT_MTX_NEAR(this->small_beta, this->small_delta, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to test the other vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still miss the other vector
| this->small_x->fill(1); | ||
| this->small_r->fill(2); | ||
| this->small_z->fill(3); | ||
| this->small_w->fill(4); | ||
|
|
||
| this->small_p->fill(4); | ||
| this->small_q->fill(3); | ||
| this->small_f->fill(2); | ||
| this->small_g->fill(1); | ||
|
|
||
| this->small_rho->at(0) = 2; | ||
| this->small_rho->at(1) = 3; | ||
| this->small_beta->at(0) = 8; | ||
| this->small_beta->at(1) = 3; | ||
| this->small_stop.get_data()[1] = this->stopped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow AAA rules, but you can use comment to distinguish the group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the rest of test in this file
| gko::stop::ResidualNorm<value_type>::build() | ||
| .with_reduction_factor(r<value_type>::value)) | ||
| .on(exec)), | ||
| mtx_big(gko::initialize<Mtx>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the actual initiation follows the member declaration.
here you need to make it to follow the same order of member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not fixed yet. you need to move pipe_cg_factory after mtx_big
| solver->apply(b, x); | ||
|
|
||
| GKO_ASSERT_MTX_NEAR(x, l({33.0, -56.0, 81.0, -30.0, 21.0, 40.0}), | ||
| r<value_type>::value * 2 * 1e5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is 2000 times of CG test. it sounds like we need to check something again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a weird thing that I don't get. the numbers that my PIPECG implementation outputs here are the same as the tested values, and yet it fails this check. I explained this to Pratik and he fixed it by adjusting these tolerances. I've got no other idea what that could be. the output numbers are literally the same here. any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean x is [33.0, -56.0, 81.0, -30.0, 21.0, 40.0]' on output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah nevermind, I misinterpreted some things in my test example, it's out of the tolerance bounds if we don't change them
| } | ||
|
|
||
|
|
||
| } // namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| &stop_status)); | ||
| // r = r - Ax | ||
| this->get_system_matrix()->apply(neg_one_op, dense_x, one_op, r); | ||
| // z = preconditioner * r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is no specific reason, I will suggest the variable follows the paper notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're talking about variable naming, Ginkgo I chose names compatible with the CG implementation in Ginkgo. the naming in the paper contradicts the naming in CG in Ginkgo
| // delta = dot(w, z) | ||
| w->compute_conj_dot(z, delta, reduction_tmp); | ||
| // check | ||
| ++iter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the criterion check should be earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my understanding, the check depends on the value of rho, which is computed right before the delta calculation. since the computation of delta and rho will be merged in upcoming PRs, the check is placed after we compute both of them. or I don't get why should it be placed earlier and where? the results seem to be correct with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the first rho is also happened in the line 139 and 141.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so are you suggesting that I put the check before computing rho in the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one additional check after the first rho computation
|
It seems nvhpc compilers still have some issues with the pipecg tolerances, so it might need to be increased even more: https://gitlab.com/ginkgo-project/ginkgo-public-ci/-/jobs/9693465904 |
I think @yhmtsai suggests that we find another solution. I don't know what can be done |
|
The tolerance problem is not limited to PipeCG, but is prevalent across all solvers in Ginkgo. We have arbitrarily increased the tolerances for different solvers to ensure that they pass the tests. The matrices we use to test the solvers are also for the most part arbitrary (we have some control: hpd-ness, sparsity, but not over condition number for example.) A proper solution IMO would be to have full control over the matrix and vector inputs:
Then we can correlate solvers with inputs and tailor the inputs rather than tailor the output tolerances. This approach requires a more sophisticated sparse matrix generation, which we currently do not have. Therefore, IMO, the current approach should be okay, and we should revisit the controlled inputs approach at a later point (with a new PR) and update all the solvers and their tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not fully agree Pratik's comments.
Pratik's comments should only for the general testing case for different executor.
In the reference test, everything is initialized manually, so everything is under control.
I have checked the implementation with the paper and I do not find any difference between them. Because the paper also showed less convergence than CG, I do not hold this PR due to the relaxed error check on big system.
The change request is for the rest of my comments.
| gko::stop::ResidualNorm<value_type>::build() | ||
| .with_reduction_factor(r<value_type>::value)) | ||
| .on(exec)), | ||
| mtx_big(gko::initialize<Mtx>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not fixed yet. you need to move pipe_cg_factory after mtx_big
| stopped{}, | ||
| non_stopped{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopped and non_stopped need to between matrix and factory according to the member declaration
| solver->apply(b, x); | ||
|
|
||
| GKO_ASSERT_MTX_NEAR(x, l({81.0, 55.0, 45.0, 5.0, 85.0, -10.0}), | ||
| r<value_type>::value * 5 * 1e3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the condition number of mtx_big is around 315.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so what should I put as the tolerance then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I just put it here for the condition reference. from this, we can expect around 315 * r.
| this->small_rho->at(1) = 3; | ||
| this->small_beta->at(0) = 8; | ||
| this->small_beta->at(1) = 3; | ||
| this->small_stop.get_data()[1] = this->stopped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to initialize small_stop.get_data()[0]
| this->small_w->fill(8); | ||
| this->small_m->fill(8); | ||
| this->small_n->fill(24); | ||
| this->small_rho->fill(8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
| this->small_beta->at(1) = 3; | ||
| this->small_delta->at(0) = 5; | ||
| this->small_delta->at(1) = 6; | ||
| this->small_stop.get_data()[1] = this->stopped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to initialize the small_stop.get_data()[0]
| this->small_n.get(), this->small_prev_rho.get(), this->small_rho.get(), | ||
| this->small_delta.get(), &this->small_stop); | ||
|
|
||
| GKO_ASSERT_MTX_NEAR(this->small_beta, this->small_delta, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still miss the other vector
|
|
||
| solver->apply(b, x); | ||
|
|
||
| GKO_ASSERT_MTX_NEAR(x, l({33.0, -56.0, 81.0, -30.0, 21.0, 40.0}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to add a TODO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so like this or about what?
// TODO: the tolerance is too big. We might need to design better tests by generating matrices with specific condition numbers and eigenvalue distributions and defining vectors with controlled vector norms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the sentence until ...design better tests should be good enough
|
still miss the first check when having the first rho and the test_install |
sorry, I don't think I understood what you mean here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still misses the part in test/test_install/test_install.cpp.
It needs a section like
// core/solver/pipe_cg.hpp
{
using Solver = gko::solver::PipeCg<>;
check_solver<Solver>(exec, A_raw, b, x);
}
I also add the place which should contains the check.
you can use only one check but you need to rearrange the algorithm like CG
Otherwise, use one out of the loop and another in the loop after you get the updated residual norm.
core/solver/pipe_cg.cpp
Outdated
| bool all_stopped = | ||
| stop_criterion->update() | ||
| .num_iterations(iter) | ||
| .residual(r) | ||
| .implicit_sq_residual_norm(rho) | ||
| .solution(dense_x) | ||
| .check(RelativeStoppingId, true, &stop_status, &one_changed); | ||
| this->template log<log::Logger::iteration_complete>( | ||
| this, dense_b, dense_x, iter, r, nullptr, rho, &stop_status, | ||
| all_stopped); | ||
| if (all_stopped) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original place + additional check is better if you do not rearrange the check
@yhmtsai As I've mentioned in my comment before: "I think another check is failing now, I suspect that it's because of me adding PipeCG to test_install like Mike suggested, while the unified kernels are not implemented in this PR. I think I'll just add it to the test_install in my next PR with kernel implementation" meaning that I've tried adding exactly this block of code but a new check started failing after this. I'll try this again now, but I can just do this in my next PR where I implement the kernels. (I've now seen your comment under the next PR suggesting this as well, so I think that's what we are going for 👍 ) as for the check (checking twice), I'm adding this now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. sorry for missing your comment and confusing suggestion.
Yes, test_install indeed also tests for different backend, so it will be failed when the other backend are enabled.
Only one thing left is the iteration count needs to be moved a little.
Thanks for implementing it!
@yhmtsai thanks! I've rebased the PR and moved the iter counter before the second check, so that we don't check with iter==0 twice |
|
Unfortunately, it seems nvhpc might be enabling fastmath flags and hence can produce results with lower accuracy: https://gitlab.com/ginkgo-project/ginkgo-public-ci/-/jobs/9940245846 We probably need to further increase the tolerance. |
|
Thank you @gojakuch for your contribution! |
|



This PR adds the reference implementation for the pipelined CG method.