int overflow when using cuda backend

Hello,

I'm currently using AMGCL with cuda backend to solve Poisson's equation (results in a 7-point stencil, so approximately (N^3)*7 nnz). I have tested the problem with "small" matrix sizes (about N=100) and it works perfectly. However, when I get to sizes of about N=800 (nnz > 3 billion) the solver breaks because of int overflow. Creating the sparse matrix with the std::tuple method works just fine, since col and ptr are of type "ptrdiff_t". However, when doing the initial solve step (```Solver solve(A,prm,bprm);```) the code breaks because of indexing error. I have noticed that the cuda backend is set with ```CUSPARSE_INDEX_32I``` but I know cuda supports ```CUSPARSE_INDEX_64I```. 

This is my current setup:

```cpp
// Initializes and gets name of GPU hardware...
int device;
cudaDeviceProp prop;
cudaGetDevice(&device);
cudaGetDeviceProperties(&prop, device);
std::cout<<"\nUsing "<<prop.name<<" to solve..."<<std::endl;

// Initializing the profiler (helps to get information about solver steps)...
amgcl::profiler<> prof;

// Initializing AMGCL solver (AMG preconditioner and BiCGStab solver)...
typedef amgcl::backend::cuda<double> Backend;
typedef amgcl::make_solver<
  amgcl::amg<
      Backend,
      amgcl::coarsening::smoothed_aggregation,
      amgcl::relaxation::spai0
      >,
  amgcl::solver::bicgstab<Backend>
  > Solver;

// Initializing the CUSPARSE library and pass the handle to AMGCL in backend parameters (currently using default values)...
Backend::params bprm;
cusparseCreate(&bprm.cusparse_handle);

// Initializing solver parameters (to control tolerance, iters, etc.)... (currently using default values)   
Solver::params prm;

// Initializing sparse matrix (CSR format)...
std::vector<std::ptrdiff_t> ptr, col;
std::vector<double> val, rhs, x;
int iters;
double error;

ptr.clear(); ptr.reserve(M + 1); ptr.push_back(0);
col.clear(); col.reserve(M * 7); // We get 7-point stencil from "del(D*del(C))=0" using FVM,
val.clear(); val.reserve(M * 7); // so the matrix will have at most (Mx*My*Mz) * 7 nonzero elements.

rhs.resize(M);
x.resize(M);
```

and then I solve the system as:

```cpp
auto A = std::tie(M,ptr,col,val);

Solver solve(A,prm,bprm);
prof.toc("setup");
std::cout << solve << std::endl;

thrust::device_vector<double> f(rhs);
thrust::device_vector<double> xSol(x);	

prof.tic("solve");
std::tie(iters, error) = solve(f, xSol);
prof.toc("solve");
```

Another thing I'm worried about is that the problem size is too large for the GPU to handle (GPU has a global memory size of 80GB). Is there a better way to handle these types of problems (maybe through VexCL). Any help is greatly appreciated. Apologies in advance as this is my first time posting on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

int overflow when using cuda backend #287

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

int overflow when using cuda backend #287

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions