Issues with CUDA #1822

Fabian188 · 2025-04-04T11:03:57Z

Fabian188
Apr 4, 2025

I'm quite happy, I found ginkgo - thanks for it!

I did not use GPU before and now bought a large workstation with an RTX5080, mainly for Ginkgo.

The examples run and mixed-precision-ir cuda works.

However my own usage of Ginkgo, which worked finde for ref and omp, fails with a segfault in ginkgo with the cuda executor

The mixed-precision-ir example has code like

    // Read data
    auto A = share(gko::read<mtx>(std::ifstream("data/A.mtx"), exec));
    // Create RHS and initial guess as 1
    gko::size_type size = A->get_size()[0];
    auto host_x = vec::create(exec->get_master(), gko::dim<2>(size, 1));
    for (auto i = 0; i < size; i++) {
        host_x->at(i, 0) = 1.;
    }
    auto x = gko::clone(exec, host_x);
    auto b = gko::clone(exec, host_x);

I suspect, that I also need the host_x stuff (which was not used in the examples, I used for implementation). Is this the case? Where can I find background information? I find it difficult to identify/find the proper location in the code documentation.

I don't find information in the tutorials, only a summary about information I do not find. https://github.com/ginkgo-project/ginkgo/wiki/Tutorial-8:-Optimize:-Using-GPUs

Answered by Fabian188

Apr 4, 2025

The code in my first post was from the mixed-precision-ir.cpp example.

I now have

auto x = gko::matrix::Dense<GK_T>::create(exec->get_master(), gko::dim<2>(rhs.GetSize(), 1));
for(unsigned int i = 0; i < rhs.GetSize(); i++)
    x->at(i) = ...
solver->apply(b, x);

where only x is generated on exec->get_master(), csr and b are pure exec, which is cuda in this case. It seems that x is transferred from host to cuda implicitly.

With your help and more investigations I understood the concept and the meaning of exec->get_master() better, thanks a lot!

View full answer

Fabian188 · 2025-04-04T11:36:05Z

Fabian188
Apr 4, 2025
Author

With a debug build, I confirmed, that the issue is indeed about the matrix and vector handling. Despite the simple-solver example, I have my data locally as CSR and vectors

e.g. like

  // vdp is a data pointer
  auto vv = gko::make_const_array_view(exec, nnz, vdp);
  auto cv = gko::make_const_array_view(exec, nnz, (int*) m->GetColPointer());
  auto rv = gko::make_const_array_view(exec, nrow + 1, (int*) m->GetRowPointer());

  // the std::move makes sure we don't call a copy constructor
  auto csr = gko::share(gko::matrix::Csr<GK_T, int>::create_const(exec, gko::dim<2>(nrow,ncol), std::move(vv), std::move(cv), std::move(rv)));

and

  // make a ginkgo right hand side view from rhs_tmp which is std::vector<>
  auto rsv = gko::make_array_view(exec, (int) rhs_tmp.size(), rhs_tmp.data()); // again, ginkgo has signed indices
  // make the ginkgo rhs based on the view
  auto b = gko::matrix::Dense<GK_T>::create(exec, gko::dim<2>((int) rhs.GetSize(), 1), rsv, 1);

7 replies

Fabian188 Apr 4, 2025
Author

Thank you very much for you quick response!

It seems, that is it sufficient to use

  auto x = gko::matrix::Dense<GK_T>::create(exec->get_master(), gko::dim<2>(rhs.GetSize(), 1));
  for(unsigned int i = 0; i < rhs.GetSize(); i++)
  {
     x->at(i) = ...
  }

And some background automatism probably does the rest?!

I have the remaining issue with the logger

logger = gko::share(gko::log::Convergence<double>::create());
solver->add_logger(logger);
....
auto grn = gko::as<gko::matrix::Dense<double>>(logger->get_residual_norm());
double res_norm = grn->at(0,0));

segfaults in

0x00005555576a3b6a in gko::matrix::Dense<double>::at (this=0x55556d868950, row=0, col=0) at /home/fwein/code/cfs/debug/include/ginkgo/core/matrix/dense.hpp:902
        return values_.get_const_data()[linearize_index(row, col)];

I assume I need to somehow clone or copy to host first?!

Fabian188 Apr 4, 2025
Author

I resolved the issue with the norm:

  auto grn = gko::as<gko::matrix::Dense<double>>(logger->get_residual_norm());
  auto grn_host = gko::clone(exec->get_master(), grn);
  double res_norm =  grn_host->at(0,0);

yhmtsai Apr 4, 2025
Collaborator

for x, I thought you have done the same thing in the host_x and clone host_x to x on the device.
Or, is it a different code from the top?

Fabian188 Apr 4, 2025
Author

The code in my first post was from the mixed-precision-ir.cpp example.

I now have

auto x = gko::matrix::Dense<GK_T>::create(exec->get_master(), gko::dim<2>(rhs.GetSize(), 1));
for(unsigned int i = 0; i < rhs.GetSize(); i++)
    x->at(i) = ...
solver->apply(b, x);

where only x is generated on exec->get_master(), csr and b are pure exec, which is cuda in this case. It seems that x is transferred from host to cuda implicitly.

With your help and more investigations I understood the concept and the meaning of exec->get_master() better, thanks a lot!

Answer selected by Fabian188

yhmtsai Apr 7, 2025
Collaborator

I see.
If you need x several times, I will suggest to clone x to exec.
During apply, Ginkgo will copy x from host to device first and allocation, launch the solver apply, and copy the data back to host and delete the device allocation.
The memory operation hurts the performance.

Fabian188 Apr 11, 2025
Author

Sorry for the late reply - I read the answer mobile first.

Thanks for the clarification, I start understanding the concept better and better (I hope :)).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with CUDA #1822

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Issues with CUDA #1822

Uh oh!

Fabian188 Apr 4, 2025

Replies: 1 comment · 7 replies

Uh oh!

Fabian188 Apr 4, 2025 Author

Uh oh!

Uh oh!

Fabian188 Apr 4, 2025 Author

Uh oh!

Fabian188 Apr 4, 2025 Author

Uh oh!

yhmtsai Apr 4, 2025 Collaborator

Uh oh!

Fabian188 Apr 4, 2025 Author

Uh oh!

yhmtsai Apr 7, 2025 Collaborator

Uh oh!

Fabian188 Apr 11, 2025 Author

Fabian188
Apr 4, 2025

Replies: 1 comment 7 replies

Fabian188
Apr 4, 2025
Author

Fabian188 Apr 4, 2025
Author

Fabian188 Apr 4, 2025
Author

yhmtsai Apr 4, 2025
Collaborator

Fabian188 Apr 4, 2025
Author

yhmtsai Apr 7, 2025
Collaborator

Fabian188 Apr 11, 2025
Author