How to clear the memory after generation #92

rubenvde · 2025-04-10T21:51:50Z

rubenvde
Apr 10, 2025

What Stanford Spezi module is your challenge related to?

SpeziML

Description

I'm very new to SpeziLLM but I'm wondering how to clean the memory after use of a model. The memory just keeps very high.

Reproduction

I'm running on macOS 15.3.2 with Xcode 16.3 building a macOS app:

func runLLM(string: String) async throws {
        llmSession = runner(
            with: LLMLocalSchema(
                model: .deepseek_r1_distill_qwen_7b_4bit,
                parameters: .init(
                    systemPrompt: "You are a test bot",
                    maxOutputLength: 5000
                    
                ),
            )
        )
        await MainActor.run {
            llmSession?.context.append(userInput: string)
        }
        do {
            for try await token in try await llmSession!.generate() {
                await MainActor.run {
                    resultLLM.append(token)
                }
            }
            llmSession?.context.completeAssistantStreaming()
        } catch {
            print(error)
        }
    }

Expected behavior

After looking into the LLMChatView I found that it called llmSession?.context.completeAssistantStreaming(). So I expected the memory usage to go down, but the model stays loaded.

In the ideal world would it:

Load the model (start taking up 5gb of memory)
Process the request / llmSession generation
Unload the model after processing is done (now clearing the memory so it has very low memory use again)

Additional context

I think that I'm just missing something, but I couldn't find any documentation about it so any help is welcome.

Code of Conduct

I agree to follow this projects's Code of Conduct and Contributing Guidelines

Answered by philippzagar

Apr 17, 2025

Hi @rubenvde,

Thank you for reaching out and for exploring SpeziLLM! We’re excited to hear you’re integrating it into your application 🚀

You are absolutely right in your observation: memory usage remains high after the model is loaded and a request is dispatched. Once the associated ChatView (as you described) is dismissed, memory usage should return to normal levels.

If you require more granular control over how the model is managed in memory, feel free to take a look at this branch and pull request. It introduces the ability to explicitly offload the model used in a LLMLocalSession from memory using custom logic, and to reload it again when needed.

Please note that this branch is not ye…

View full answer

PSchmiedmayer · 2025-04-11T04:29:50Z

PSchmiedmayer
Apr 11, 2025
Maintainer

@philippzagar Might be the best person to provide some context about this 👍

0 replies

philippzagar · 2025-04-17T15:58:12Z

philippzagar
Apr 17, 2025
Collaborator

Hi @rubenvde,

Thank you for reaching out and for exploring SpeziLLM! We’re excited to hear you’re integrating it into your application 🚀

You are absolutely right in your observation: memory usage remains high after the model is loaded and a request is dispatched. Once the associated ChatView (as you described) is dismissed, memory usage should return to normal levels.

If you require more granular control over how the model is managed in memory, feel free to take a look at this branch and pull request. It introduces the ability to explicitly offload the model used in a LLMLocalSession from memory using custom logic, and to reload it again when needed.

Please note that this branch is not yet merged into main, as we are still refining the underlying mechanisms. The goal is to make memory management more declarative in the future, allowing SpeziLLM to automatically determine when the model can be safely offloaded based on runtime metrics and heuristics.

In the meantime, we recommend using the branch linked above if you need manual control over model offloading.

Let us know if you have any further questions or need assistance. We’re happy to support you.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stanford Spezi

How to clear the memory after generation #92

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Stanford Spezi

How to clear the memory after generation #92

Uh oh!

rubenvde Apr 10, 2025

What Stanford Spezi module is your challenge related to?

Description

Reproduction

Expected behavior

Additional context

Code of Conduct

Replies: 2 comments

Uh oh!

PSchmiedmayer Apr 11, 2025 Maintainer

Uh oh!

philippzagar Apr 17, 2025 Collaborator

rubenvde
Apr 10, 2025

PSchmiedmayer
Apr 11, 2025
Maintainer

philippzagar
Apr 17, 2025
Collaborator