Skip to content

Conversation

upsj
Copy link
Member

@upsj upsj commented Feb 17, 2025

As a starting point and example for adding work estimates to kernels, this adds the necessary operations to all non-trivial kernels in a simple unpreconditioned Cg solve.

Example output for the simple-solver example in a debug build

Runtime summary

name total total (self) count avg avg (self) performance
total 2.5 ms 832.3 us 1 2.5 ms 832.3 us
apply(gko::solver::Cg) 1.6 ms 17.9 us 1 1.6 ms 17.9 us
iteration 1.5 ms 508.1 us 19 81.2 us 26.7 us
check(gko::stop::Combined) 305.9 us 91.0 us 20 15.3 us 4.6 us
apply(gko::matrix::Identity) 268.0 us 101.2 us 20 13.4 us 5.1 us
apply(gko::matrix::Csr<double, int>) 235.1 us 122.9 us 19 12.4 us 6.5 us
check(gko::stop::ResidualNorm) 193.1 us 148.3 us 20 9.7 us 7.4 us
copy(gko::matrix::Dense,gko::matrix::Dense) 166.8 us 142.7 us 20 8.3 us 7.1 us
csr::spmv 112.2 us 112.2 us 19 5.9 us 5.9 us 363.2 MB/s
advanced_apply(gko::matrix::Csr<double, int>) 67.1 us 38.8 us 2 33.5 us 19.4 us
dense::compute_conj_dot_dispatch 58.1 us 58.1 us 39 1.5 us 1.5 us 204.1 MB/s
generate(gko::solver::Cg::Factory) 50.3 us 50.3 us 1 50.3 us 50.3 us
cg::step_2 46.8 us 46.8 us 19 2.5 us 2.5 us 370.7 MB/s
cg::step_1 43.6 us 43.6 us 19 2.3 us 2.3 us 198.6 MB/s
dense::compute_norm2_dispatch 36.7 us 36.7 us 22 1.7 us 1.7 us 91.1 MB/s
csr::advanced_spmv 28.3 us 28.3 us 2 14.1 us 14.1 us 162.3 MB/s
dense::copy 24.1 us 24.1 us 20 1.2 us 1.2 us 252.0 MB/s
check(gko::stop::Iteration) 21.7 us 21.7 us 20 1.1 us 1.1 us
allocate 18.5 us 18.5 us 31 598.0 ns 598.0 ns
residual_norm::residual_norm 17.2 us 17.2 us 20 858.0 ns 858.0 ns
components::aos_to_soa 16.9 us 16.9 us 3 5.6 us 5.6 us
cg::initialize 15.0 us 15.0 us 1 15.0 us 15.0 us 70.7 MB/s
components::convert_idxs_to_ptrs 13.5 us 13.5 us 1 13.5 us 13.5 us
free 13.3 us 13.3 us 31 430.0 ns 430.0 ns
dense::fill 13.2 us 13.2 us 4 3.3 us 3.3 us 24.3 MB/s
components::fill_array 6.3 us 6.3 us 1 6.3 us 6.3 us
dense::fill_in_matrix_data 3.9 us 3.9 us 2 1.9 us 1.9 us

Overhead estimate 482.5 us

Work estimates available for 14.9 % of runtime

@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Feb 17, 2025
@upsj upsj requested a review from a team February 17, 2025 17:28
@upsj upsj self-assigned this Feb 17, 2025
@ginkgo-bot ginkgo-bot added mod:core This is related to the core module. type:solver This is related to the solvers type:matrix-format This is related to the Matrix formats labels Feb 17, 2025
@upsj upsj force-pushed the benchmark_work_estimate_logger branch from eca15d0 to 9ac9b8f Compare February 20, 2025 11:34
@upsj upsj force-pushed the benchmark_work_estimate_cg_csr_spmv branch from 48ab867 to c24ccca Compare February 20, 2025 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-for-review This PR is ready for review mod:core This is related to the core module. type:matrix-format This is related to the Matrix formats type:solver This is related to the solvers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants