How to clear the memory after generation #92
-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@philippzagar Might be the best person to provide some context about this 👍 |
Beta Was this translation helpful? Give feedback.
-
Hi @rubenvde, Thank you for reaching out and for exploring SpeziLLM! We’re excited to hear you’re integrating it into your application 🚀 You are absolutely right in your observation: memory usage remains high after the model is loaded and a request is dispatched. Once the associated If you require more granular control over how the model is managed in memory, feel free to take a look at this branch and pull request. It introduces the ability to explicitly offload the model used in a Please note that this branch is not yet merged into main, as we are still refining the underlying mechanisms. The goal is to make memory management more declarative in the future, allowing SpeziLLM to automatically determine when the model can be safely offloaded based on runtime metrics and heuristics. In the meantime, we recommend using the branch linked above if you need manual control over model offloading. Let us know if you have any further questions or need assistance. We’re happy to support you. |
Beta Was this translation helpful? Give feedback.
Hi @rubenvde,
Thank you for reaching out and for exploring SpeziLLM! We’re excited to hear you’re integrating it into your application 🚀
You are absolutely right in your observation: memory usage remains high after the model is loaded and a request is dispatched. Once the associated
ChatView
(as you described) is dismissed, memory usage should return to normal levels.If you require more granular control over how the model is managed in memory, feel free to take a look at this branch and pull request. It introduces the ability to explicitly offload the model used in a
LLMLocalSession
from memory using custom logic, and to reload it again when needed.Please note that this branch is not ye…