Add PIPECG reference implementation #1824

gojakuch · 2025-04-09T21:14:57Z

This PR adds the reference implementation for the pipelined CG method.

gojakuch · 2025-04-09T21:17:33Z

I've put the "Ready for review" label, but I'm not sure if I should've put the "Requires feedback" one instead, since this is my first PR here and it's hard for me to assess the state of it. any feedback would be considered of course :)

core/solver/pipe_cg.cpp

core/solver/pipe_cg_kernels.hpp

reference/solver/pipe_cg_kernels.cpp

common/unified/solver/pipe_cg_kernels.cpp

pratikvn

Nice work! Mostly looks good to me. Only the documentation needs to be updated.

common/unified/solver/pipe_cg_kernels.cpp

include/ginkgo/core/solver/pipe_cg.hpp

reference/solver/pipe_cg_kernels.cpp

core/solver/pipe_cg.cpp

include/ginkgo/core/solver/pipe_cg.hpp

reference/solver/pipe_cg_kernels.cpp

pratikvn

Great! Thanks for making the changes. One last thing before we merge it is that you add yourself to the contributors list here after Marcel Koch: https://github.com/ginkgo-project/ginkgo/blob/develop/contributors.txt#L20 with the following info. LastName FirstName <email> Affiliation

pratikvn

LGTM! Thank you for your contribution!

yhmtsai

need to add small section into test/test_install/test_install.cpp

yhmtsai · 2025-04-11T08:05:26Z

core/test/solver/pipe_cg.cpp

+#include "core/test/utils.hpp"
+
+
+namespace {


Suggested change

namespace {

we do not need the anonymous namespace for test now

yhmtsai · 2025-04-11T08:14:01Z

core/test/solver/pipe_cg.cpp

+}
+
+
+}  // namespace


yhmtsai · 2025-04-11T08:22:42Z

reference/test/solver/pipe_cg_kernels.cpp

+        this->small_n.get(), this->small_prev_rho.get(), this->small_rho.get(),
+        this->small_delta.get(), &this->small_stop);
+
+    GKO_ASSERT_MTX_NEAR(this->small_beta, this->small_delta, 0);


also need to test the other vector

still miss the other vector

yhmtsai · 2025-04-11T08:23:15Z

reference/test/solver/pipe_cg_kernels.cpp

+    this->small_x->fill(1);
+    this->small_r->fill(2);
+    this->small_z->fill(3);
+    this->small_w->fill(4);
+
+    this->small_p->fill(4);
+    this->small_q->fill(3);
+    this->small_f->fill(2);
+    this->small_g->fill(1);
+
+    this->small_rho->at(0) = 2;
+    this->small_rho->at(1) = 3;
+    this->small_beta->at(0) = 8;
+    this->small_beta->at(1) = 3;
+    this->small_stop.get_data()[1] = this->stopped;


follow AAA rules, but you can use comment to distinguish the group

same for the rest of test in this file

yhmtsai · 2025-04-11T08:24:57Z

reference/test/solver/pipe_cg_kernels.cpp

+                      gko::stop::ResidualNorm<value_type>::build()
+                          .with_reduction_factor(r<value_type>::value))
+                  .on(exec)),
+          mtx_big(gko::initialize<Mtx>(


the actual initiation follows the member declaration.
here you need to make it to follow the same order of member.

not fixed yet. you need to move pipe_cg_factory after mtx_big

yhmtsai · 2025-04-11T08:27:36Z

reference/test/solver/pipe_cg_kernels.cpp

+    solver->apply(b, x);
+
+    GKO_ASSERT_MTX_NEAR(x, l({33.0, -56.0, 81.0, -30.0, 21.0, 40.0}),
+                        r<value_type>::value * 2 * 1e5);


it is 2000 times of CG test. it sounds like we need to check something again.

this is a weird thing that I don't get. the numbers that my PIPECG implementation outputs here are the same as the tested values, and yet it fails this check. I explained this to Pratik and he fixed it by adjusting these tolerances. I've got no other idea what that could be. the output numbers are literally the same here. any suggestions?

do you mean x is [33.0, -56.0, 81.0, -30.0, 21.0, 40.0]' on output?

yeah nevermind, I misinterpreted some things in my test example, it's out of the tolerance bounds if we don't change them

yhmtsai · 2025-04-11T08:28:19Z

reference/test/solver/pipe_cg_kernels.cpp

+}
+
+
+}  // namespace


core/solver/pipe_cg.cpp

yhmtsai · 2025-04-11T08:34:23Z

core/solver/pipe_cg.cpp

+                                         &stop_status));
+    // r = r - Ax
+    this->get_system_matrix()->apply(neg_one_op, dense_x, one_op, r);
+    // z = preconditioner * r


if there is no specific reason, I will suggest the variable follows the paper notation.

if you're talking about variable naming, Ginkgo I chose names compatible with the CG implementation in Ginkgo. the naming in the paper contradicts the naming in CG in Ginkgo

yhmtsai · 2025-04-11T08:36:44Z

core/solver/pipe_cg.cpp

+        // delta = dot(w, z)
+        w->compute_conj_dot(z, delta, reduction_tmp);
+        // check
+        ++iter;


I think the criterion check should be earlier?

in my understanding, the check depends on the value of rho, which is computed right before the delta calculation. since the computation of delta and rho will be merged in upcoming PRs, the check is placed after we compute both of them. or I don't get why should it be placed earlier and where? the results seem to be correct with this

I think the first rho is also happened in the line 139 and 141.

so are you suggesting that I put the check before computing rho in the loop?

one additional check after the first rho computation

pratikvn · 2025-04-11T14:17:13Z

It seems nvhpc compilers still have some issues with the pipecg tolerances, so it might need to be increased even more: https://gitlab.com/ginkgo-project/ginkgo-public-ci/-/jobs/9693465904

gojakuch · 2025-04-11T19:12:34Z

It seems nvhpc compilers still have some issues with the pipecg tolerances, so it might need to be increased even more: https://gitlab.com/ginkgo-project/ginkgo-public-ci/-/jobs/9693465904

I think @yhmtsai suggests that we find another solution. I don't know what can be done

pratikvn · 2025-04-12T12:56:29Z

The tolerance problem is not limited to PipeCG, but is prevalent across all solvers in Ginkgo. We have arbitrarily increased the tolerances for different solvers to ensure that they pass the tests. The matrices we use to test the solvers are also for the most part arbitrary (we have some control: hpd-ness, sparsity, but not over condition number for example.)

A proper solution IMO would be to have full control over the matrix and vector inputs:

Generate matrices with specific condition numbers and eigenvalue distributions.
Define vectors with controlled vector norms.

Then we can correlate solvers with inputs and tailor the inputs rather than tailor the output tolerances. This approach requires a more sophisticated sparse matrix generation, which we currently do not have.

Therefore, IMO, the current approach should be okay, and we should revisit the controlled inputs approach at a later point (with a new PR) and update all the solvers and their tests.

gojakuch · 2025-04-18T17:34:56Z

@yhmtsai please consider @pratikvn's comment when reviewing. I rebased this and tried to address all the other things you've mentioned except for the tolerances

yhmtsai

I do not fully agree Pratik's comments.
Pratik's comments should only for the general testing case for different executor.
In the reference test, everything is initialized manually, so everything is under control.
I have checked the implementation with the paper and I do not find any difference between them. Because the paper also showed less convergence than CG, I do not hold this PR due to the relaxed error check on big system.

The change request is for the rest of my comments.

yhmtsai · 2025-04-22T11:37:30Z

reference/test/solver/pipe_cg_kernels.cpp

+                      gko::stop::ResidualNorm<value_type>::build()
+                          .with_reduction_factor(r<value_type>::value))
+                  .on(exec)),
+          mtx_big(gko::initialize<Mtx>(


not fixed yet. you need to move pipe_cg_factory after mtx_big

yhmtsai · 2025-04-22T11:38:56Z

reference/test/solver/pipe_cg_kernels.cpp

+          stopped{},
+          non_stopped{},


stopped and non_stopped need to between matrix and factory according to the member declaration

yhmtsai · 2025-04-22T11:46:46Z

reference/test/solver/pipe_cg_kernels.cpp

+    solver->apply(b, x);
+
+    GKO_ASSERT_MTX_NEAR(x, l({81.0, 55.0, 45.0, 5.0, 85.0, -10.0}),
+                        r<value_type>::value * 5 * 1e3);


the condition number of mtx_big is around 315.

ok, so what should I put as the tolerance then?

sorry, I just put it here for the condition reference. from this, we can expect around 315 * r.

yhmtsai · 2025-04-22T11:50:30Z

reference/test/solver/pipe_cg_kernels.cpp

+    this->small_rho->at(1) = 3;
+    this->small_beta->at(0) = 8;
+    this->small_beta->at(1) = 3;
+    this->small_stop.get_data()[1] = this->stopped;


also need to initialize small_stop.get_data()[0]

yhmtsai · 2025-04-22T11:51:14Z

reference/test/solver/pipe_cg_kernels.cpp

+    this->small_w->fill(8);
+    this->small_m->fill(8);
+    this->small_n->fill(24);
+    this->small_rho->fill(8);


yhmtsai · 2025-04-22T11:55:26Z

reference/test/solver/pipe_cg_kernels.cpp

+    this->small_beta->at(1) = 3;
+    this->small_delta->at(0) = 5;
+    this->small_delta->at(1) = 6;
+    this->small_stop.get_data()[1] = this->stopped;


need to initialize the small_stop.get_data()[0]

yhmtsai · 2025-04-22T11:56:54Z

reference/test/solver/pipe_cg_kernels.cpp

+        this->small_n.get(), this->small_prev_rho.get(), this->small_rho.get(),
+        this->small_delta.get(), &this->small_stop);
+
+    GKO_ASSERT_MTX_NEAR(this->small_beta, this->small_delta, 0);


still miss the other vector

yhmtsai · 2025-04-22T12:11:39Z

reference/test/solver/pipe_cg_kernels.cpp

+
+    solver->apply(b, x);
+
+    GKO_ASSERT_MTX_NEAR(x, l({33.0, -56.0, 81.0, -30.0, 21.0, 40.0}),


I suggest to add a TODO here.

so like this or about what?

// TODO: the tolerance is too big. We might need to design better tests by generating matrices with specific condition numbers and eigenvalue distributions and defining vectors with controlled vector norms.

I think the sentence until ...design better tests should be good enough

yhmtsai · 2025-04-22T12:26:49Z

still miss the first check when having the first rho and the test_install

gojakuch · 2025-04-22T16:52:17Z

still miss the first check when having the first rho and the test_install

sorry, I don't think I understood what you mean here

yhmtsai

I think it still misses the part in test/test_install/test_install.cpp.
It needs a section like

// core/solver/pipe_cg.hpp
{
    using Solver = gko::solver::PipeCg<>;
    check_solver<Solver>(exec, A_raw, b, x);
}

I also add the place which should contains the check.
you can use only one check but you need to rearrange the algorithm like CG
Otherwise, use one out of the loop and another in the loop after you get the updated residual norm.

yhmtsai · 2025-05-02T07:34:07Z

core/solver/pipe_cg.cpp

+        bool all_stopped =
+            stop_criterion->update()
+                .num_iterations(iter)
+                .residual(r)
+                .implicit_sq_residual_norm(rho)
+                .solution(dense_x)
+                .check(RelativeStoppingId, true, &stop_status, &one_changed);
+        this->template log<log::Logger::iteration_complete>(
+            this, dense_b, dense_x, iter, r, nullptr, rho, &stop_status,
+            all_stopped);
+        if (all_stopped) {
+            break;
+        }


I think the original place + additional check is better if you do not rearrange the check

core/solver/pipe_cg.cpp

gojakuch · 2025-05-05T21:24:21Z

I think it still misses the part in test/test_install/test_install.cpp. It needs a section like
// core/solver/pipe_cg.hpp
{
    using Solver = gko::solver::PipeCg<>;
    check_solver<Solver>(exec, A_raw, b, x);
}

@yhmtsai As I've mentioned in my comment before: "I think another check is failing now, I suspect that it's because of me adding PipeCG to test_install like Mike suggested, while the unified kernels are not implemented in this PR. I think I'll just add it to the test_install in my next PR with kernel implementation"

meaning that I've tried adding exactly this block of code but a new check started failing after this. I'll try this again now, but I can just do this in my next PR where I implement the kernels. (I've now seen your comment under the next PR suggesting this as well, so I think that's what we are going for 👍 )

as for the check (checking twice), I'm adding this now

yhmtsai

LGTM. sorry for missing your comment and confusing suggestion.
Yes, test_install indeed also tests for different backend, so it will be failed when the other backend are enabled.
Only one thing left is the iteration count needs to be moved a little.
Thanks for implementing it!

gojakuch · 2025-05-05T22:14:54Z

LGTM. sorry for missing your comment and confusing suggestion. Yes, test_install indeed also tests for different backend, so it will be failed when the other backend are enabled. Only one thing left is the iteration count needs to be moved a little. Thanks for implementing it!

@yhmtsai thanks! I've rebased the PR and moved the iter counter before the second check, so that we don't check with iter==0 twice

pratikvn · 2025-05-06T14:15:54Z

Unfortunately, it seems nvhpc might be enabling fastmath flags and hence can produce results with lower accuracy: https://gitlab.com/ginkgo-project/ginkgo-public-ci/-/jobs/9940245846

We probably need to further increase the tolerance.

pratikvn · 2025-05-08T09:21:15Z

Thank you @gojakuch for your contribution!

sonarqubecloud · 2025-05-08T23:48:43Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gojakuch added mod:reference This is related to the reference module. type:solver This is related to the solvers 1:ST:ready-for-review This PR is ready for review labels Apr 9, 2025

pratikvn self-requested a review April 10, 2025 08:47

pratikvn assigned pratikvn and gojakuch Apr 10, 2025

yhmtsai reviewed Apr 10, 2025

View reviewed changes

core/solver/pipe_cg.cpp Outdated Show resolved Hide resolved

core/solver/pipe_cg_kernels.hpp Outdated Show resolved Hide resolved

reference/solver/pipe_cg_kernels.cpp Show resolved Hide resolved

common/unified/solver/pipe_cg_kernels.cpp Outdated Show resolved Hide resolved

pratikvn requested changes Apr 10, 2025

View reviewed changes

common/unified/solver/pipe_cg_kernels.cpp Outdated Show resolved Hide resolved

include/ginkgo/core/solver/pipe_cg.hpp Outdated Show resolved Hide resolved

reference/solver/pipe_cg_kernels.cpp Show resolved Hide resolved

core/solver/pipe_cg.cpp Outdated Show resolved Hide resolved

pratikvn reviewed Apr 10, 2025

View reviewed changes

include/ginkgo/core/solver/pipe_cg.hpp Show resolved Hide resolved

reference/solver/pipe_cg_kernels.cpp Show resolved Hide resolved

gojakuch force-pushed the feat/pipe-cg branch from e01b933 to 4aca494 Compare April 10, 2025 21:33

gojakuch requested review from pratikvn and yhmtsai April 10, 2025 21:35

pratikvn requested changes Apr 10, 2025

View reviewed changes

pratikvn approved these changes Apr 11, 2025

View reviewed changes

pratikvn force-pushed the feat/pipe-cg branch from 902110b to 35f18f5 Compare April 11, 2025 07:43

pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Apr 11, 2025

yhmtsai requested changes Apr 11, 2025

View reviewed changes

yhmtsai added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:ready-to-merge This PR is ready to merge. labels Apr 11, 2025

gojakuch force-pushed the feat/pipe-cg branch from 35f18f5 to 73ab495 Compare April 18, 2025 16:11

gojakuch requested a review from yhmtsai April 18, 2025 17:32

yhmtsai requested changes Apr 22, 2025

View reviewed changes

gojakuch force-pushed the feat/pipe-cg branch from 496294d to 28eacf8 Compare April 29, 2025 17:34

gojakuch mentioned this pull request Apr 30, 2025

Add PIPECG unified kernels #1838

Merged

yhmtsai requested changes May 2, 2025

View reviewed changes

yhmtsai approved these changes May 5, 2025

View reviewed changes

gojakuch force-pushed the feat/pipe-cg branch from 00f5084 to cfba575 Compare May 5, 2025 22:10

MarcelKoch added this to the Ginkgo 1.10.0 milestone May 6, 2025

gojakuch added the 1:ST:run-full-test label May 6, 2025

pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels May 6, 2025

gojakuch force-pushed the feat/pipe-cg branch from 8baad8d to 96820e3 Compare May 7, 2025 13:35

pratikvn and others added 12 commits May 8, 2025 11:04

copy skeleton from cg

3811631

Add PIPECG reference implemetnation

7e40f0a

[test] update solver tolerances

cea0138

Update PIPECG comments and docs

9690e03

Address potential infinite loop in PIPECG

400ad75

Add Atell Krasnopolsky to contributors.txt

d284b9a

Address PR review suggestions

f99a554

Address the new PR suggestions

fdbdac2

Move the convergence check in PIPECG

5c4463d

Add another convergence check in PIPECG

31d6796

Move the iteration counter in PIPECG

242516a

Increase the tolerance in the PipeCg::SolvesBigDenseSystem1 test

a1ea585

gojakuch force-pushed the feat/pipe-cg branch from 558b275 to a1ea585 Compare May 8, 2025 09:05

pratikvn merged commit 4015c54 into develop May 8, 2025
13 of 15 checks passed

pratikvn deleted the feat/pipe-cg branch May 8, 2025 09:20


		solver->apply(b, x);

		GKO_ASSERT_MTX_NEAR(x, l({33.0, -56.0, 81.0, -30.0, 21.0, 40.0}),

Add PIPECG reference implementation #1824

Add PIPECG reference implementation #1824

Uh oh!

Conversation

gojakuch commented Apr 9, 2025

Uh oh!

gojakuch commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pratikvn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pratikvn left a comment

Choose a reason for hiding this comment

Uh oh!

pratikvn left a comment

Choose a reason for hiding this comment

Uh oh!

yhmtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gojakuch Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pratikvn commented Apr 11, 2025

Uh oh!

gojakuch commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratikvn commented Apr 12, 2025

Uh oh!

gojakuch Apr 11, 2025 •

edited

Loading

gojakuch commented Apr 11, 2025 •

edited

Loading

yhmtsai commented Apr 22, 2025 •

edited

Loading