Merge dot products in PIPECG #1908

gojakuch · 2025-08-06T16:25:35Z

This PR continues on the work done in PIPECG by making it fulfill its initial purpose of reducing the number of dot products in the algorithm by merging them into a singe dot operation.

pratikvn

part 1/2. Still need to look at the algorithm itself.

core/solver/pipe_cg.cpp

gojakuch · 2025-08-09T19:27:28Z

Addressed the suggestions yesterday, rebased now.
It fails the test comparing the reference implementation to omp on my machine, which is very odd, as they seem to do the same thing. the reference implementation is now totally correct though, it passes the tests. somehow smth is wrong with step_1, still debugging it

codecov · 2025-09-02T05:59:37Z

Codecov Report

❌ Patch coverage is 96.87500% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.11%. Comparing base (6e06b0a) to head (734b3fb).
⚠️ Report is 5 commits behind head on develop.

Files with missing lines	Patch %	Lines
core/solver/pipe_cg.cpp	93.87%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1908      +/-   ##
===========================================
+ Coverage    88.74%   89.11%   +0.36%     
===========================================
  Files          857      857              
  Lines        71844    71839       -5     
===========================================
+ Hits         63761    64020     +259     
+ Misses        8083     7819     -264

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yhmtsai

LGTM in general.

use submatrix creation
Performance concern:
z1 z2 can be one vector after we have conjugate spmv
some access will be strided access

yhmtsai · 2025-09-02T13:58:56Z

core/solver/pipe_cg.cpp

+
+    LocalVector* rw = this->template create_workspace_op<LocalVector>(
+        GKO_SOLVER_TRAITS::rw, conjoined_size);
+    auto r_unique = LocalVector::create(


use create_submatrix for right setup.
second one create array view include the non-existed memory

yhmtsai · 2025-09-02T13:59:21Z

core/solver/pipe_cg.cpp

+                        rw->get_values()),
+        b_stride * 2);
+    auto* r = r_unique.get();
+    auto w_unique = LocalVector::create(


same here and following

yhmtsai · 2025-09-02T14:08:51Z

core/solver/pipe_cg.cpp

            beta, gko::detail::get_local(p), gko::detail::get_local(q),
            gko::detail::get_local(f), gko::detail::get_local(g),
-            gko::detail::get_local(z), gko::detail::get_local(w),
+            gko::detail::get_local(z1), gko::detail::get_local(w),


note z1 is strided access which might lower your performance

do you think there's a way to avoid this?

currently no under this storage. You need change z storage and it will also require change the other to fit the need

yhmtsai · 2025-09-02T14:10:16Z

reference/solver/pipe_cg_kernels.cpp

+                z1->at(i, j) -= tmp * f->at(i, j);
+                z2->at(i, j) -= tmp * f->at(i, j);


because z2 is always sync with z1, z2 = z1 should be better

it still use z2 -= tmp * f

common/unified/solver/pipe_cg_kernels.cpp

test/solver/pipe_cg_kernels.cpp

stale

pratikvn

Nice work! LGTM!

gojakuch · 2025-09-23T22:25:58Z

Nice work! LGTM!

@pratikvn thanks and also thanks for helping me! I rebased the PR

yhmtsai · 2025-09-24T08:14:17Z

Is it ready to review? and do you have performance comparison?

yhmtsai · 2025-10-09T13:13:08Z

core/solver/pipe_cg.cpp

+    // GKO_SOLVER_VECTOR(r, dense_b);
+    // GKO_SOLVER_VECTOR(w, dense_b);
+    // into rw that we later slice for efficient dot product computation
+    auto b_stride = dense_b->get_stride();


do you need to use the stride from b?
the rw vectors actually have 2 * b_stride not b_stride.

yes, b_stride is used, say, here:

auto* r = r_unique.get(); auto w_unique = rw->create_submatrix( local_span{0, local_original_size[0]}, local_span{b_stride, b_stride + local_original_size[1]}, global_original_size); auto* w = w_unique.get();

we multiply by 2 when needed

Yes, I know where b_stride is from.
My question is more on whether to use b_stride or use the #vectors as stride.

as Pratik told me today, the b_stride is fine bc of padding and it's probably equivalent to #vectors most of the time. I used b_stride because dense_b is a vector so mathematically it made sense to me to use the dimensions of the original vector as reference for a size variable used to create other vectors

yhmtsai · 2025-10-09T13:20:14Z

reference/solver/pipe_cg_kernels.cpp

+                z1->at(i, j) -= tmp * f->at(i, j);
+                z2->at(i, j) -= tmp * f->at(i, j);


it still use z2 -= tmp * f

yhmtsai · 2025-10-09T13:25:10Z

test/solver/solver.cpp

 struct PipeCg : SimpleSolverTest<gko::solver::PipeCg<solver_value_type>> {
    static double tolerance() { return 1e7 * r<value_type>::value; }
+
+    static constexpr bool will_not_allocate() { return false; }


why will it allocate some in the second round?

what do you mean by "the second round"?

when will_not_allocate is false, ApplyDoesntAllocateRepeatedly will be skipped.
The test is to check whether we have additional allocation in the second round of apply.
The first apply usually has the workspace apply but we try to avoid allocate/free in the second round of apply

I'll remove this so that we can see and debug if possible unwanted behaviour here

common/unified/solver/pipe_cg_kernels.cpp

yhmtsai · 2025-10-14T07:45:08Z

core/solver/pipe_cg.cpp

+    auto& reduction_tmp = this->template create_workspace_array<char>(
+        GKO_SOLVER_TRAITS::tmp, 2 * global_original_size[1]);


Suggested change

auto& reduction_tmp = this->template create_workspace_array<char>(

GKO_SOLVER_TRAITS::tmp, 2 * global_original_size[1]);

auto& reduction_tmp = this->template create_workspace_array<char>(

GKO_SOLVER_TRAITS::tmp);

reduction tmp will need to go through the reduction step to know proper size, so we do not need to initialize a size for that

yhmtsai

Good work! The code looks good to me.
I am still not sure whether it gives the performance in practice by this version between the stride and communication cost.

sonarqubecloud · 2025-10-15T23:03:23Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
96.4% Coverage on New Code
2.1% Duplication on New Code

See analysis details on SonarQube Cloud

gojakuch · 2025-10-17T17:23:54Z

@yhmtsai @pratikvn I'm flagging this as ready to merge then

gojakuch · 2025-10-17T17:25:59Z

the typo check fails but I don't think it has anything to do with this PR, so I'm not sure if this needs to be fixed here

yhmtsai · 2025-10-20T07:34:52Z

I think you can ignore the typo check now. I will create another pr to fix them

gojakuch added the 1:ST:WIP This PR is a work in progress. Not ready for review. label Aug 6, 2025

pratikvn assigned gojakuch Aug 7, 2025

pratikvn self-requested a review August 7, 2025 07:53

pratikvn reviewed Aug 7, 2025

View reviewed changes

core/solver/pipe_cg.cpp Outdated Show resolved Hide resolved

core/solver/pipe_cg.cpp Outdated Show resolved Hide resolved

core/solver/pipe_cg.cpp Outdated Show resolved Hide resolved

core/solver/pipe_cg.cpp Outdated Show resolved Hide resolved

gojakuch force-pushed the feat/pipe-cg-dotmerge branch 2 times, most recently from f24265c to 23a35a3 Compare August 9, 2025 19:23

gojakuch force-pushed the feat/pipe-cg-dotmerge branch from 23a35a3 to 4c7a3d5 Compare August 15, 2025 18:34

gojakuch requested a review from pratikvn August 15, 2025 18:36

gojakuch added 1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Aug 18, 2025

gojakuch changed the title ~~Merge the dot products in PIPECG~~ Merge dot products in PIPECG Aug 18, 2025

gojakuch force-pushed the feat/pipe-cg-dotmerge branch from b3aee64 to c2be7cb Compare August 30, 2025 20:10

yhmtsai requested changes Sep 2, 2025

View reviewed changes

yhmtsai previously requested changes Sep 3, 2025

View reviewed changes

test/solver/pipe_cg_kernels.cpp Outdated Show resolved Hide resolved

gojakuch force-pushed the feat/pipe-cg-dotmerge branch from ab40d16 to 995969d Compare September 3, 2025 10:29

ginkgo-project deleted a comment from ginkgo-bot Sep 3, 2025

gojakuch requested a review from yhmtsai September 4, 2025 08:55

gojakuch force-pushed the feat/pipe-cg-dotmerge branch from 6a73634 to 66f4ebe Compare September 10, 2025 13:10

gojakuch force-pushed the feat/pipe-cg-dotmerge branch from 66f4ebe to 1da0064 Compare September 12, 2025 19:52

pratikvn approved these changes Sep 22, 2025

View reviewed changes

gojakuch force-pushed the feat/pipe-cg-dotmerge branch from 7c24aea to 781631d Compare September 23, 2025 22:24

gojakuch and others added 13 commits October 6, 2025 09:29

Merge two dot product calls in PIPECG

d0c8f13

Wrong prototype approach

93869f2

Use another approach to array merging

a756cc6

Debug

f7a2001

Remove commented code

c74bbba

suggestions part 1

e0a5644

Apply the diff to fix the errors

8b19a53

Fix copying z1 to z2

2487dea

skip the repeated allocation check for PIPECG

164b999

use assignment and fix the test

f7935f0

use

262cc7e

Add PIPECG to distributed benchmarks

0a6052e

enable distributed support for PipeCg

c510304

gojakuch force-pushed the feat/pipe-cg-dotmerge branch from 781631d to c510304 Compare October 6, 2025 07:29

yhmtsai requested changes Oct 9, 2025

View reviewed changes

gojakuch added 2 commits October 12, 2025 21:53

use assignment

ea9fc04

use default stride on p q f g

9cebf65

yhmtsai reviewed Oct 14, 2025

View reviewed changes

gojakuch added 2 commits October 14, 2025 20:54

remove unnecessary argument

5adbaf7

remove the will_not_allocate=false

734b3fb

yhmtsai approved these changes Oct 15, 2025

View reviewed changes

gojakuch added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Oct 17, 2025

		z1->at(i, j) -= tmp * f->at(i, j);
		z2->at(i, j) -= tmp * f->at(i, j);

		auto& reduction_tmp = this->template create_workspace_array<char>(
		GKO_SOLVER_TRAITS::tmp, 2 * global_original_size[1]);

Merge dot products in PIPECG #1908

Are you sure you want to change the base?

Merge dot products in PIPECG #1908

Uh oh!

Conversation

gojakuch commented Aug 6, 2025

Uh oh!

pratikvn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gojakuch commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yhmtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pratikvn left a comment

Choose a reason for hiding this comment

Uh oh!

gojakuch commented Sep 23, 2025

Uh oh!

yhmtsai commented Sep 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhmtsai left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 15, 2025

Quality Gate passed

Uh oh!

gojakuch commented Oct 17, 2025

Uh oh!

gojakuch commented Oct 17, 2025

Uh oh!

gojakuch commented Aug 9, 2025 •

edited

Loading

codecov bot commented Sep 2, 2025 •

edited

Loading