Enabling Generic HIP Builds #933

Spaarsh · 2025-07-13T06:27:12Z

Spaarsh
Jul 13, 2025

Hello team,

I am Spaarsh¹ and I am trying to enable HIP/ROCm in the debian package for dbcsr under my GSoC'25 work as a contributor at Debian³. Cordell Bloor (@cgmb⁴) is my GSoC mentor.

The package can be easily built for a GPU but the current build process requires us to run the entire build multiple times for all GPU architectures (which, in case of AMD, are currently gfx906, gfx908 and gfx90a). To prevent this, cmake allows to use a flag called CMAKE_HIP_ARCHITECTURES⁵ that allows us to create a fat binary (a generic build, loosely speaking) that is compatible for multiple architectures simultaneously. But it has been manually disabled in the build process as seen here⁶. Due to this reason we need to build a separate binary for each supported architecture (libdbcsr-rocm-dev-gfxXXXX) which bloats our source package (and not to mention the increased build time).

Upon further investigation, I came to realize that the reason for the aforementioned disable was due to the custom kernel building⁷ that dbcsr does. This cannot be manipulated via cmake since those sections are built via python scripts. A non-trivial amount of changes shall be required to achieve a full generic build, if it is possible at all.

However, after analyzing the binaries built for different packages, I understand that only libdbcsr.a is dependent on the architecture (please correct me if I am wrong).

With this understanding, I propose the following change, which would result in a single binary file that shall be able to support multiple GPU architectures with lesser bloating of our debian package:
Currently the build process uses the WITH_GPU flag to build the package. Assuming I am not mistaken about only libdbcsr.a being architecture-dependent, a way to compile a libdbcsr.arch.a for each architecture would enable support for all architectures via a single binary. This can be achieved by introducing a new optional flag, say HIP_MULTI_ARCHITECTURE, in the main CMakeLists.txt⁸ file that can accept multiple architectures/target GPUs as input. In the CMakeLists.txt we parse over the inputs in the flag and loop over the libdbcsr.a compilation, changing the WITH_GPU for each iteration. This shall result in a libdbcsr.arch.a file after each iteration. Finally, all these files can be a part of the libdbcsr-rocm-dev package (you can find the MR here⁹. This would also need us to make the WITH_GPU flag optional. A new check that ensures that at least WITH_GPU or HIP_MULTI_ARCHITECTURE is defined shall be added.

The proposed change will allow us to build a generic HIP/ROCm dbcsr package for the dbcsr debian package.

Regards,
Spaarsh

hfp · 2025-07-14T09:34:17Z

hfp
Jul 14, 2025
Maintainer

Thank you for sharing, we/I have yet to digest your proposal. Regarding generic package, we have an OpenCL based flavor which works just fine on all major GPU architectures like single binary (same goes for CP2K). The reason for non-generic build with CUDA/HIP are specialized kernels and in the past no fallback to generic/untuned parameters. This is now solved for CUDA/HIP but may need some extra work to expose a generic GPU package. Worst case for the latter is to always fall-back even worse to have done data transfers before entering the fallback. For OpenCL, this was never a problem since tuned parameters are entirely optional plus it can include all tuned parameters available with one binary plus having a GPU based fallback for parameters. Questioning your desire in general, DBCSR's GPU support relies on reasonable support for FP64 aka double-precision which may be badly supported by client GPUs (very unfavorable FLOPS ratio vs single-precision let alone low precision). With unfavorable I mean the GPU may yield less FLOPS then the aggregated CPU package (all cores). Even with FP64, GPU acceleration can fall short just to mention it.

1 reply

hfp Jul 14, 2025
Maintainer

For example, cmake -DUSE_ACCEL=opencl just works and may only depend on Khronos ICD loader which then uses whatever driver is available. If you had CUDA, it would not even need Khronos ICD loader but then depend on CUDA installation. OpenCL (C-)headers and the ICD loader are standard packages on Debian.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling Generic HIP Builds #933

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Enabling Generic HIP Builds #933

Uh oh!

Spaarsh Jul 13, 2025

Replies: 1 comment · 1 reply

Uh oh!

hfp Jul 14, 2025 Maintainer

Uh oh!

hfp Jul 14, 2025 Maintainer

Spaarsh
Jul 13, 2025

Replies: 1 comment 1 reply

hfp
Jul 14, 2025
Maintainer

hfp Jul 14, 2025
Maintainer