Replies: 1 comment 1 reply
-
Thank you for sharing, we/I have yet to digest your proposal. Regarding generic package, we have an OpenCL based flavor which works just fine on all major GPU architectures like single binary (same goes for CP2K). The reason for non-generic build with CUDA/HIP are specialized kernels and in the past no fallback to generic/untuned parameters. This is now solved for CUDA/HIP but may need some extra work to expose a generic GPU package. Worst case for the latter is to always fall-back even worse to have done data transfers before entering the fallback. For OpenCL, this was never a problem since tuned parameters are entirely optional plus it can include all tuned parameters available with one binary plus having a GPU based fallback for parameters. Questioning your desire in general, DBCSR's GPU support relies on reasonable support for FP64 aka double-precision which may be badly supported by client GPUs (very unfavorable FLOPS ratio vs single-precision let alone low precision). With unfavorable I mean the GPU may yield less FLOPS then the aggregated CPU package (all cores). Even with FP64, GPU acceleration can fall short just to mention it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello team,
I am Spaarsh1 and I am trying to enable HIP/ROCm in the debian package for
dbcsr
under my GSoC'25 work as a contributor at Debian3. Cordell Bloor (@cgmb4) is my GSoC mentor.The package can be easily built for a GPU but the current build process requires us to run the entire build multiple times for all GPU architectures (which, in case of AMD, are currently
gfx906
,gfx908
andgfx90a
). To prevent this,cmake
allows to use a flag calledCMAKE_HIP_ARCHITECTURES
5 that allows us to create a fat binary (a generic build, loosely speaking) that is compatible for multiple architectures simultaneously. But it has been manually disabled in the build process as seen here6. Due to this reason we need to build a separate binary for each supported architecture (libdbcsr-rocm-dev-gfxXXXX
) which bloats our source package (and not to mention the increased build time).Upon further investigation, I came to realize that the reason for the aforementioned disable was due to the custom kernel building7 that
dbcsr
does. This cannot be manipulated via cmake since those sections are built via python scripts. A non-trivial amount of changes shall be required to achieve a full generic build, if it is possible at all.However, after analyzing the binaries built for different packages, I understand that only
libdbcsr.a
is dependent on the architecture (please correct me if I am wrong).With this understanding, I propose the following change, which would result in a single binary file that shall be able to support multiple GPU architectures with lesser bloating of our debian package:
Currently the build process uses the
WITH_GPU
flag to build the package. Assuming I am not mistaken about onlylibdbcsr.a
being architecture-dependent, a way to compile alibdbcsr.arch.a
for each architecture would enable support for all architectures via a single binary. This can be achieved by introducing a new optional flag, sayHIP_MULTI_ARCHITECTURE
, in the main CMakeLists.txt8 file that can accept multiple architectures/target GPUs as input. In the CMakeLists.txt we parse over the inputs in the flag and loop over thelibdbcsr.a
compilation, changing theWITH_GPU
for each iteration. This shall result in alibdbcsr.arch.a
file after each iteration. Finally, all these files can be a part of thelibdbcsr-rocm-dev
package (you can find the MR here9. This would also need us to make theWITH_GPU
flag optional. A new check that ensures that at leastWITH_GPU
orHIP_MULTI_ARCHITECTURE
is defined shall be added.The proposed change will allow us to build a generic HIP/ROCm dbcsr package for the dbcsr debian package.
Regards,
Spaarsh
Beta Was this translation helpful? Give feedback.
All reactions