-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Hello,
I'm currently using AMGCL with cuda backend to solve Poisson's equation (results in a 7-point stencil, so approximately (N^3)*7 nnz). I have tested the problem with "small" matrix sizes (about N=100) and it works perfectly. However, when I get to sizes of about N=800 (nnz > 3 billion) the solver breaks because of int overflow. Creating the sparse matrix with the std::tuple method works just fine, since col and ptr are of type "ptrdiff_t". However, when doing the initial solve step (Solver solve(A,prm,bprm);
) the code breaks because of indexing error. I have noticed that the cuda backend is set with CUSPARSE_INDEX_32I
but I know cuda supports CUSPARSE_INDEX_64I
.
This is my current setup:
// Initializes and gets name of GPU hardware...
int device;
cudaDeviceProp prop;
cudaGetDevice(&device);
cudaGetDeviceProperties(&prop, device);
std::cout<<"\nUsing "<<prop.name<<" to solve..."<<std::endl;
// Initializing the profiler (helps to get information about solver steps)...
amgcl::profiler<> prof;
// Initializing AMGCL solver (AMG preconditioner and BiCGStab solver)...
typedef amgcl::backend::cuda<double> Backend;
typedef amgcl::make_solver<
amgcl::amg<
Backend,
amgcl::coarsening::smoothed_aggregation,
amgcl::relaxation::spai0
>,
amgcl::solver::bicgstab<Backend>
> Solver;
// Initializing the CUSPARSE library and pass the handle to AMGCL in backend parameters (currently using default values)...
Backend::params bprm;
cusparseCreate(&bprm.cusparse_handle);
// Initializing solver parameters (to control tolerance, iters, etc.)... (currently using default values)
Solver::params prm;
// Initializing sparse matrix (CSR format)...
std::vector<std::ptrdiff_t> ptr, col;
std::vector<double> val, rhs, x;
int iters;
double error;
ptr.clear(); ptr.reserve(M + 1); ptr.push_back(0);
col.clear(); col.reserve(M * 7); // We get 7-point stencil from "del(D*del(C))=0" using FVM,
val.clear(); val.reserve(M * 7); // so the matrix will have at most (Mx*My*Mz) * 7 nonzero elements.
rhs.resize(M);
x.resize(M);
and then I solve the system as:
auto A = std::tie(M,ptr,col,val);
Solver solve(A,prm,bprm);
prof.toc("setup");
std::cout << solve << std::endl;
thrust::device_vector<double> f(rhs);
thrust::device_vector<double> xSol(x);
prof.tic("solve");
std::tie(iters, error) = solve(f, xSol);
prof.toc("solve");
Another thing I'm worried about is that the problem size is too large for the GPU to handle (GPU has a global memory size of 80GB). Is there a better way to handle these types of problems (maybe through VexCL). Any help is greatly appreciated. Apologies in advance as this is my first time posting on GitHub.