-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Labels
Description
Memory allocations are currently cause performance issues as they trigger the garbage collector. This is worse when using MPI, as it can be triggered at different times in different processes, causing the communication to get out of sync. For example, this profile:
You can see that the GC pause (in red) on the top process means that the MPI_Waitall on the bottom process takes longer (as it is waiting on data to be sent from the top process). This effect will get worse at higher core counts, hurting scaling.
Potential solutions
Minimize memory allocations
The memory allocations seem to be caused by a few things:
- LU factorization in the linear solver (memory allocations in tridiagonal LU factorization #648)
- Use of
Ref
s insidebycolumn
closures. For some reason these aren't being elided. The easiest fix seems to be to use an immutable object as a scalar container for broadcasting (will require some changes to ClimaCore).
Synchronize calls to the garbage collector
- This gets a little bit tricky, but we can disable Julia's GC (
GC.enable(false)
), and then periodically callGC.gc()
manually. We would need to experiment with calling full vs incremental GC (see https://bkamins.github.io/julialang/2021/06/11/vecvec.html)