PeleLMeX vs PeleC GPU Memory Usage #514
Replies: 3 comments 5 replies
-
I haven't previously done a direct comparison, but PeleLMeX is definitely a bit of a memory hog and I'm not surprised that it would require significantly more memory for the same size grid as PeleC based on algorithmic and code structure differences. Although it seems like you're seeing ~5X memory usage for LMeX vs C, which is quite substantial. PeleLMeX stores a lot of intermediates (transport coefficients, various transport terms, etc) as vectors of multifabs containing the full domain data across all levels (see how many vectors of multifabs with NUM_SPECIES components are defined in PeleLMeX_Data.cpp). In PeleC, a lot of this is computed at the FAB level and discarded rather than being stored. You may just have to spread your simulation over a larger number of GPUs with PeleLMeX. Take a look at the |
Beta Was this translation helpful? Give feedback.
-
I only skimmed all this. It looks like you are aware AMReX uses a memory pool. It allocates like 80% of GPU memory at initialization by default. It looks like you're using |
Beta Was this translation helpful? Give feedback.
-
On the memory requirements, 2.5x is not surprising to me either. There are a couple of structural reasons for this:
On PeleC vs PeleLMeX, we did some comparison on the ECP challenge problem a few years back, and PeleLMeX was about 10x faster. But this is highly dependent on your CFL number and the stiffness of the chemistry. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I have recently been testing whether PeleLMeX would offer performance improvements over PeleC for some of my simulations. I was successfully able to restart from a plot file and get LMeX running, but I am noticing that the memory usage of PeleLMeX appears to be ~twice that of PeleC in my tests.
I have configured the input files between PeleC and PeleLMeX such that the cases should be as identical as possible. I have compiled both using the same compilation flags and libraries, and have set
TINY_PROFILE = TRUE
andMEM_PROFILE = TRUE
for both. The actual case is a bluff body stabilized methane-air flame, using EB for the geometry and drm19 for the chemistry, is run with a base grid of432 x 144 x 72
and has one level of AMR (refining based on temperature, vorticity magnitude, and Y_CO). I have been testing the codes using 8x Nvidia A100-40GB cards, compiled using CUDA 12.6.With AMR, both codes have approximately 26M cells (~4.5M level 0, ~21M level 1), and share the same AMR settings here:
I did have to disable regridding in PeleLMeX due to running out of memory. I have also run other tests varying a few parameters that I have noticed come up in my github searches, namely
but changing these did not appear to make much of a difference (besides
amrex.the_arena_is_managed=0
, which led to an immediate OOM crash in LMeX).From my tests, I have found that PeleLMeX is using at around twice the memory as PeleC. While the PeleC runs use around 160GB of GPU memory, PeleLMeX uses the full 280GB that I have allocated (which I confirmed with
nvitop
), and it still doesn't appear to be enough. I did another test of LMeX withamr.max_level=0
, again restarting from a plot file, and even then the utilization was hovering close to what PeleC uses with one level of AMR.I am wondering if this is expected behavior caused by the difference in numerical schemes between the codes, or if there appears to be something wrong with how I have built and run PeleLMeX. I have attached the run outputs (including tinyprofiler outputs at the bottom) from the three tests - one for PeleC, one for LMeX, and another for LMeX with AMR disabled.
PeleC.log
PeleLMeX_noAMR.log
PeleLMeX_withAMR.log
Thank you for your help!
Beta Was this translation helpful? Give feedback.
All reactions