Replies: 1 comment 4 replies
-
Hi Xiuqi, the main objective for GPU acceleration with HYPRE is to reduce the computational time for the iterative solver (CG+AMG) it uses, and at the same time get exposure to the new HPC systems we have access to. We have seen good Poisson solver speedup for UGLMAT HYPRE in outdoor flows and single zone cases. This also applies to very fine calculations. The main characteristic of this solver is that there are no velocity errors in internal or inter-mesh boundaries. As the rest of the code is still running on cpus, you still want to have several meshes associated to the piece of the work (Poisson matrix) that goes to a GPU. This is also due having more cpu cores than GPUS in a node. Usually we bundle the consecutive cores in a NUMA domain with the GPU they are closer in the architecture. For example in Polaris there are 32 cores and 4 GPUs, so you can bundle 8 MPI processes (and meshes) with 8 cores and associate them to their GPU in a resource set. Then, the Poisson solve work for the GPU will be for all the cells of the 8 meshes in the resource set. You handle how many MPI processes are associated with a GPU with the environment variable FDS_RANKS_PER_GPU. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi community,
Thank you for the fantastic work maintaining FireX and integrating GPU acceleration into FDS.
I’m interested in understanding:
What types of simulations benefit the most from FireX’s GPU-accelerated pressure solver?
For example:
Large multi-mesh domains with many pressure zones?
Fine mesh resolutions (e.g., < 5 cm cell size)?
Cases with complex geometry or high temporal resolution?
Long-duration simulations for wildland fire spread?
What is the recommended setup for running such GPU-intensive cases?
Any mesh or domain decomposition guidelines?
Best practices for MPI ranks per GPU or CPU-GPU balancing?
Does the test_gpu.fds verification case reflect a good benchmark for expected performance. Some cases are conducted from my side and it seems that the computation time is sensitive to both mesh blocks, CPU and GPU together
I’d also be curious if you have cluster-specific tuning tips, especially for hybrid CPU-GPU environments (e.g., Frontier, Polaris, or Stampede).
Looking forward to your insights and any examples you’re willing to share!
Best,
Xiuqi
Beta Was this translation helpful? Give feedback.
All reactions