Skip to content

Commit 7a47238

Browse files
committed
Initial commit
0 parents  commit 7a47238

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+42525
-0
lines changed

INSTALL

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
Installation Instructions
2+
*************************
3+
4+
Copyright 2025 Koç University and Simula Research Laboratory
5+
6+
Copying and distribution of this file, with or without
7+
modification, are permitted provided the copyright notice and this
8+
notice are preserved.
9+
10+
11+
Software dependencies
12+
=====================
13+
14+
To install aCG the following minimum software dependencies must be satisfied:
15+
16+
* CMake 3.12 or newer
17+
18+
* A C/C++ compiler compatible with C++-17
19+
20+
* For NVIDIA GPUs: NVIDIA CUDA Compiler (NVCC), cuBLAS and cuSPARSE from CUDA Toolkit 11.6 or newer
21+
22+
* For AMD GPUs: HIP C++ compiler, hipBLAS and hipSPARSE from ROCm 6.0.0 or newer
23+
24+
* A GPU-aware MPI library (e.g., HPC-X from NVIDIA HPC SDK, or Cray MPICH)
25+
26+
The following optional software packages may be needed to enable some features:
27+
28+
* METIS 5.1.0 is needed to partition matrices when using multiple GPUs
29+
30+
* NVIDIA Collective Communications Library (NCCL) version 2.18.5 or
31+
newer is needed to use NCCL-based communication for NVIDIA GPUs
32+
33+
* ROCm Collective Communications Library (RCCL) version 2.18.3 or
34+
newer is needed to use RCCL-based communication for AMD GPUs
35+
36+
* NVSHMEM version 2.10.0 or newer is needed to use CPU- or
37+
GPU-initiated one-sided communication for NVIDIA GPUs
38+
39+
* PETSc 3.17 or newer with CUDA or HIP support enabled is needed to
40+
use PETSc's CG/pipelined CG solvers
41+
42+
* zlib is needed to use gzip-compressed Matrix Market files as input
43+
44+
* If the compiler supports OpenMP, then it can be enabled to use
45+
multiple threads to speed up some preprocessing steps.
46+
47+
48+
Basic Installation
49+
==================
50+
51+
The CMake build system is used to install aCG. A basic installation
52+
can be performed by first creating a build directory, e.g., 'build/':
53+
54+
$ mkdir build
55+
$ cd build
56+
57+
From the build directory, the command
58+
59+
$ cmake ../cuda
60+
61+
builds the CUDA application for NVIDIA GPUs, or
62+
63+
$ cmake ../hip
64+
65+
builds the HIP application for AMD GPUs.
66+
67+
Once the cmake configuration is finished, use `make' to compile the
68+
application:
69+
70+
$ make
71+
72+
73+
Installation options
74+
====================
75+
76+
The usual options used by CMake to manage installation are supported.
77+
In addition, the following options are useful for aCG:
78+
79+
* CMAKE_BUILD_TYPE should be set to Release to enable optimisations
80+
when conducting performance benchmarks.
81+
82+
* CMAKE_CUDA_ARCHITECTURES can be used to set the CUDA architecture
83+
when compiling CUDA kernels. By default, the following architectures
84+
are included: 70 (Volta), 75 (Turing), 80 (Ampere) and 90 (Hopper).
85+
86+
* CMAKE_HIP_FLAGS can be used to set the HIP architecture that is used
87+
to compile HIP kernels. For example, for AMD Instinct MI250x, it is
88+
recommended to set -DCMAKE_HIP_FLAGS="--offload-arch=gfx90a".
89+
90+
* ACG_ENABLE_PROFILING can be set to enable detailed CUDA/HIP-based
91+
event profiling. This can be used to print detailed information
92+
about time spent in different GPU kernels for some of the CG
93+
solvers.
94+
95+
* IDXSIZE can be set to 64 to enable the use of 64-bit integers to
96+
index matrix rows and columns. This may be needed for matrices with
97+
more than 2 billion rows/columns.
98+
99+
Other options are used to specify locations of third-party libraries:
100+
101+
* METIS_DIR is used to specify the location of the METIS library.
102+
Alternatively, METIS_INCLUDE_DIR and METIS_LIB_DIR can be set to
103+
directories containing the METIS header files and library,
104+
respectively.
105+
106+
* NCCL_HOME or NCCL_ROOT are used to specify the location of the NCCL
107+
installation. These may also be set as environment variables.
108+
109+
* NVSHMEM_DIR is used to specify the location of the NVSHMEM
110+
installation. The environment variables NVSHMEM_HOME or
111+
NVSHMEM_PREFIX may also be used.

LICENSE

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright 2025 Koç University and Simula Research Laboratory
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

NEWS

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
* Noteworthy changes in release 1.0.0 (2025-04-26)
2+
3+
This is the initial release of aCG, which provides GPU-accelerated
4+
iterative linear solvers based on the conjugate gradient (CG) method.
5+
The solvers support NVIDIA and AMD GPUs, and multi-GPU systems with
6+
GPU-aware MPI, NCCL/RCCL or NVSHMEM.
7+
8+
For NVIDIA GPUs, CUDA implementations of CG and pipelined CG are
9+
provided. Communication is performed by the host CPU using GPU-aware
10+
MPI, NCCL or NVSHMEM, or by the GPU using NVSHMEM device-initiated
11+
communication.
12+
13+
For AMD GPUs, HIP implementations of CG and pipelined CG are provided.
14+
Communication is performed by the host CPU using GPU-aware MPI or RCCL
15+
communication. A single-GPU HIP version of the monolithic,
16+
device-side CG solver is also provided.
17+
18+
--
19+
Copyright 2025 Koç University and Simula Research Laboratory
20+
21+
Copying and distribution of this file, with or without modification,
22+
are permitted provided the copyright notice and this notice are
23+
preserved.

README

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
This is the README file for aCG, a suite of GPU-accelerated iterative
2+
linear solvers based on the conjugate gradient (CG) method, supporting
3+
NVIDIA and AMD GPUs, as well as multi-GPU systems with GPU-aware MPI,
4+
NCCL, RCCL or NVSHMEM.
5+
6+
For NVIDIA GPUs, CUDA implementations of CG and pipelined CG are
7+
provided. Communication is performed by the host CPU using GPU-aware
8+
MPI, NCCL or NVSHMEM, or by the GPU using NVSHMEM device-initiated
9+
communication.
10+
11+
For AMD GPUs, HIP implementations of CG and pipelined CG are provided.
12+
Communication is performed by the host CPU with GPU-aware MPI or RCCL.
13+
A single-GPU HIP version of a monolithic, device-side CG solver is
14+
also provided.
15+
16+
See the file INSTALL for instructions on how to build and install.
17+
18+
aCG is free software, available under a permissive software license.
19+
See the file LICENSE for copying conditions.
20+
21+
--
22+
Copyright 2025 Koç University and Simula Research Laboratory
23+
24+
Copying and distribution of this file, with or without modification,
25+
are permitted provided the copyright notice and this notice are
26+
preserved.

0 commit comments

Comments
 (0)