What types of FDS simulations benefit most from FireX GPU acceleration, and what is the recommended setup? #14758

xixiuqi · 2025-06-11T05:25:04Z

xixiuqi
Jun 11, 2025

Hi community,

Thank you for the fantastic work maintaining FireX and integrating GPU acceleration into FDS.

I’m interested in understanding:

What types of simulations benefit the most from FireX’s GPU-accelerated pressure solver?

For example:
- Large multi-mesh domains with many pressure zones?
- Fine mesh resolutions (e.g., < 5 cm cell size)?
- Cases with complex geometry or high temporal resolution?
- Long-duration simulations for wildland fire spread?
What is the recommended setup for running such GPU-intensive cases?
- Any mesh or domain decomposition guidelines?
- Best practices for MPI ranks per GPU or CPU-GPU balancing?
- Does the test_gpu.fds verification case reflect a good benchmark for expected performance. Some cases are conducted from my side and it seems that the computation time is sensitive to both mesh blocks, CPU and GPU together

I’d also be curious if you have cluster-specific tuning tips, especially for hybrid CPU-GPU environments (e.g., Frontier, Polaris, or Stampede).

Looking forward to your insights and any examples you’re willing to share!

Best,
Xiuqi

marcosvanella · 2025-06-11T15:32:37Z

marcosvanella
Jun 11, 2025
Collaborator

Hi Xiuqi, the main objective for GPU acceleration with HYPRE is to reduce the computational time for the iterative solver (CG+AMG) it uses, and at the same time get exposure to the new HPC systems we have access to. We have seen good Poisson solver speedup for UGLMAT HYPRE in outdoor flows and single zone cases. This also applies to very fine calculations. The main characteristic of this solver is that there are no velocity errors in internal or inter-mesh boundaries. As the rest of the code is still running on cpus, you still want to have several meshes associated to the piece of the work (Poisson matrix) that goes to a GPU. This is also due having more cpu cores than GPUS in a node. Usually we bundle the consecutive cores in a NUMA domain with the GPU they are closer in the architecture. For example in Polaris there are 32 cores and 4 GPUs, so you can bundle 8 MPI processes (and meshes) with 8 cores and associate them to their GPU in a resource set. Then, the Poisson solve work for the GPU will be for all the cells of the 8 meshes in the resource set. You handle how many MPI processes are associated with a GPU with the environment variable FDS_RANKS_PER_GPU.
See for example the input files in Verification/GPU_Tests/HYPRE_GPU_SCALING/ for varying number of MPI processes and meshes keeping the number of GPUs fixed and using resource sets.
The pinning of the cores to the corresponding GPU is system and queuing system dependent. For multi GPU cases you want to use a GPU aware mpi distribution.
Our current strategy to get into GPU computing is to do it through third party libraries and the next item in our development is to put the chemistry solution to GPU using Sundials.

4 replies

XXstand Jun 11, 2025

Hi Marcos,

Thank you so much for the insightful response and the clear explanations—it’s been incredibly helpful.

I’m planning to use the UGLMAT solver for its advantages in GPU acceleration and velocity continuity across mesh interfaces. However, I also need to apply local mesh refinements to specific regions of my domain. From what I understand, UGLMAT requires uniform mesh resolutions across all blocks (errors when I have different mesh resolution), and using different resolutions would force a switch to ULMAT.

Do you have any suggestions or best practices for using ULMAT effectively in this case? Alternatively, is there a development roadmap or strategy in place for supporting non-uniform mesh sizes with UGLMAT in the future?

marcosvanella Jun 11, 2025
Collaborator

Yes, making the global solver span across grid refinements is in our development plans. With ULMAT each MPI process solves the pressure equation in its own mesh. You don't have resource set bundling with FDS_RANKS_PER_GPU but each process can use the associated GPU to run HYPRE. You would have to test the cost of having several MPI processes communicating directly to GPU (processor oversubscription). Having large grids should help with the Poisson solution part of the algorithm.

schumakov Jun 11, 2025

@marcosvanella Quick question: if instead of one large mesh block each MPI_PROCESS has multiple smaller blocks, would that still benefit from GPU utilization? Out setup uses ULMAT due to some mesh refinement.

marcosvanella Jun 11, 2025
Collaborator

Probably not, you would be calling HYPRE and exchanging memory with GPU + solve in GPU for each pressure zone in each mesh. The time spent exchanging data between RAM and GPU memory would probably offset any GPU gains. A rule of thumb to keep the GPU (A100, H200) busy is to have a problem with 1 million unknowns (cells) or more sent to it.
An option to look into would be to build a single matrix with all these zone-mesh matrices as submatrices and send the whole problem to the GPU using resource sets. This would require re-organizing ULMAT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What types of FDS simulations benefit most from FireX GPU acceleration, and what is the recommended setup? #14758

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What types of FDS simulations benefit most from FireX GPU acceleration, and what is the recommended setup? #14758

Uh oh!

xixiuqi Jun 11, 2025

Replies: 1 comment · 4 replies

Uh oh!

marcosvanella Jun 11, 2025 Collaborator

Uh oh!

XXstand Jun 11, 2025

Uh oh!

marcosvanella Jun 11, 2025 Collaborator

Uh oh!

schumakov Jun 11, 2025

Uh oh!

marcosvanella Jun 11, 2025 Collaborator

xixiuqi
Jun 11, 2025

Replies: 1 comment 4 replies

marcosvanella
Jun 11, 2025
Collaborator

marcosvanella Jun 11, 2025
Collaborator

marcosvanella Jun 11, 2025
Collaborator