Native Memory Leak with DJL (0.32.0) + PyTorch (2.5.1) in Long-running JVM Application #3708
Replies: 5 comments
-
If you have a minimal reproduce case, I can take a look. Native memory leak usually caused by resource not closed properly. Even you closed your own NDManager, it's possible some of the resource is leaked into parent NDManager. Are you using your own Translator or use NDArray directly? We recommend follow Predictor/Translator pattern. It's easier to track the resource. |
Beta Was this translation helpful? Give feedback.
-
Memory not released until process exit despite closing all DJL objects. This small is example I'm working with DJL and I've encountered an issue where memory is not released immediately after closing all resources like Predictor, ZooModel, and NDManager. Instead, the memory is released only after the entire process (JVM) exits.
Even though Predictor and Model are properly closed, the GPU memory usage (checked via nvidia-smi) stays allocated until the process exits. This can cause issues in long-running or reusable components. Is there a recommended way to force full native memory release, or is this expected behavior due to CUDA context management? Thanks! |
Beta Was this translation helpful? Give feedback.
-
The code looks fine although it's not efficient. You can load the model, create the predictor and run the loop, then close the model when program exit. I don't think there is native memory leak. A few thing you may need to be aware of:
You application OOM may caused by other leak not related to the code you showed here. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the information! We're building a long-running application using DJL (Deep Java Library), and our goal is to load and unload different models dynamically, without restarting the process. Based on your explanation, we understand that PyTorch (via DJL) and CUDA manage memory through internal pools and caching, so memory might not be released back to the system even after the model and NDArray instances are closed. However, we are seeing memory build-up over time, and we're looking for ways to release GPU memory between model runs to reduce peak usage and avoid fragmentation. |
Beta Was this translation helpful? Give feedback.
-
If you keep loading/unloading models, you may experience OOM, this is caused by PyTorch memory pool segmentation. Let's say you can load 20 models and run predictions without any issue, but when you unload the model, you won't be able to load it back. So the only workaround is only load max 15 models (you need figure this number out). I know some customer is doing this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
We're using DJL 0.32.0 with the PyTorch 2.5.1 engine in a component integrated into a long-running JVM application. The component performs repeated inference operations over the application's lifetime.
We're experiencing a serious issue with native memory growth — after each inference, off-heap/native memory usage increases gradually until it reaches system limits and the application crashes due to out-of-native-memory conditions (OOM), even though JVM heap remains stable.
Environment
DJL: 0.32.0
Engine: PyTorch 2.5.1
Runtime: Long-running server-side JVM application
Hardware: Issue occurs on both CPU and GPU (CUDA) configurations
What we've tried
We're using NDManager properly with try-with-resources or manual close().
We avoid keeping NDArray instances beyond the lifetime of their NDManager.
Still, native memory continues to grow over time with each inference.
No Java heap leaks are observed — the problem is strictly off-heap/native memory.
Questions
Is it expected that the DJL PyTorch engine holds persistent native allocations that are not cleared by emptyCache()?
Thanks
Kamil
Beta Was this translation helpful? Give feedback.
All reactions