Skip to content

int overflow when using cuda backend #287

@amiguel256

Description

@amiguel256

Hello,

I'm currently using AMGCL with cuda backend to solve Poisson's equation (results in a 7-point stencil, so approximately (N^3)*7 nnz). I have tested the problem with "small" matrix sizes (about N=100) and it works perfectly. However, when I get to sizes of about N=800 (nnz > 3 billion) the solver breaks because of int overflow. Creating the sparse matrix with the std::tuple method works just fine, since col and ptr are of type "ptrdiff_t". However, when doing the initial solve step (Solver solve(A,prm,bprm);) the code breaks because of indexing error. I have noticed that the cuda backend is set with CUSPARSE_INDEX_32I but I know cuda supports CUSPARSE_INDEX_64I.

This is my current setup:

// Initializes and gets name of GPU hardware...
int device;
cudaDeviceProp prop;
cudaGetDevice(&device);
cudaGetDeviceProperties(&prop, device);
std::cout<<"\nUsing "<<prop.name<<" to solve..."<<std::endl;

// Initializing the profiler (helps to get information about solver steps)...
amgcl::profiler<> prof;

// Initializing AMGCL solver (AMG preconditioner and BiCGStab solver)...
typedef amgcl::backend::cuda<double> Backend;
typedef amgcl::make_solver<
  amgcl::amg<
      Backend,
      amgcl::coarsening::smoothed_aggregation,
      amgcl::relaxation::spai0
      >,
  amgcl::solver::bicgstab<Backend>
  > Solver;

// Initializing the CUSPARSE library and pass the handle to AMGCL in backend parameters (currently using default values)...
Backend::params bprm;
cusparseCreate(&bprm.cusparse_handle);

// Initializing solver parameters (to control tolerance, iters, etc.)... (currently using default values)   
Solver::params prm;

// Initializing sparse matrix (CSR format)...
std::vector<std::ptrdiff_t> ptr, col;
std::vector<double> val, rhs, x;
int iters;
double error;

ptr.clear(); ptr.reserve(M + 1); ptr.push_back(0);
col.clear(); col.reserve(M * 7); // We get 7-point stencil from "del(D*del(C))=0" using FVM,
val.clear(); val.reserve(M * 7); // so the matrix will have at most (Mx*My*Mz) * 7 nonzero elements.

rhs.resize(M);
x.resize(M);

and then I solve the system as:

auto A = std::tie(M,ptr,col,val);

Solver solve(A,prm,bprm);
prof.toc("setup");
std::cout << solve << std::endl;

thrust::device_vector<double> f(rhs);
thrust::device_vector<double> xSol(x);	

prof.tic("solve");
std::tie(iters, error) = solve(f, xSol);
prof.toc("solve");

Another thing I'm worried about is that the problem size is too large for the GPU to handle (GPU has a global memory size of 80GB). Is there a better way to handle these types of problems (maybe through VexCL). Any help is greatly appreciated. Apologies in advance as this is my first time posting on GitHub.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions