JIT loaded model cutting off and unloading generating model

**Which version of LM Studio?**
Example: LM Studio 0.3.15

**Which operating system?**
macOS

**What is the bug?**
When a JIT loaded model is generating tokens, JIT loading a new model cuts the former model off and unloads it straight away instead of letting it finish inference. Context are local models used in open-webui with multiple user. If user A' request is generating tokens, user B could do a request and cut off the generation of user A's request, which is not optimal. Having user B's request wait until user A's is finished is perfectly fine, the waiting time is a good compromise instead of having errors/request hanging.

Unfortunately disabling "Only keep last JIT loaded model" is not an option as lmstudio doesn't automatically unload older models when RAM/VRAM is getting full, contrary to ollama for example which does handle auto unloading in a smarter way, multiple models can be loaded until the loading of a new model would go over the RAM/VRAM limit at which point the oldest model is unloaded.

I believe at least an option to not cut off generating models when JIT loading should be available, and maybe a smarter loading solution similar to ollama should be considered.

**Screenshots**
- JIT settings:
![Image](https://github.com/user-attachments/assets/2957d92e-a698-4477-a14a-d451bbbb427a)

**Logs**
not relevant

**To Reproduce**
Steps to reproduce the behavior:
1. Set JIT settings as above
2. Make a request that would JIT load model A and run inference (curl request to the openai chat completions endpoint for example)
3. While model A is generating tokens, make a second request that would JIT load model B
4. Observe model A being cut off and unloaded in favor of model B


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT loaded model cutting off and unloading generating model #671

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JIT loaded model cutting off and unloading generating model #671

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions