Disabling VRAM <-> RAM offloading without --gpu-only #6659

Big-Whoop · 2025-01-31T12:41:44Z

Big-Whoop
Jan 31, 2025

Hi there,

Is there a way to achieve that Comfy treats VRAM like i.e. Ollama does?

I would like to disable offloading to RAM but keep partitial loading for large models. Flux.dev for example doesn't quite fit in my 24 GB VRAM, but it's still faster/almost as fast and the output quality is a lot better than Q8 GGUF or fp8_*, especially when rendering images with text.

Currently Comfy does the following: Load Text Encoders to VRAM -> Use Text encoders -> Offload Text Encoders to RAM -> Particially Load Flux (about 95%, hardly any speed reduction) -> Render -> Offload Flux to RAM and repeat.

At the same time my OS (Linux) caches the models in RAM, too (in the buff/cache memory of the OS) So the models get cached in RAM twice.

What I'd like to see is: Load Text Encoders to VRAM -> Use Text Encoders -> Discard Text from VRAM -> Load Flux Partitially -> Render -> Discard Flux from VRAM / Shared VRAM -> Repeat

That would not only be faster but also more memory efficient.

When I use the the --high-vram or --gpu-only switch I run out of memory on the device.
When I use the --disable-smart-memory switch it unloads models immediately after using them but still offloads them to VRAM.

Thanks in advance,
Peter

ltdrdata · 2025-01-31T13:04:34Z

ltdrdata
Jan 31, 2025
Collaborator

Use --lowvram it will load model partially.

2 replies

Big-Whoop Jan 31, 2025
Author

Hey, that was a blazing fast reply thank you.

But it doesn't quite do what I want. It still loads the models to RAM before (partly) pushing them to VRAM when needed. I would like to see a behavior that only loads the parts of the models into RAM which don't fit in the VRAM and discard the models from both VRAM and RAM entirely when not needed.

Ribonz May 3, 2025

I wish it can use "shared gpu memory", which I've set in BIOS, so a lower level than what's on the OS.

Big-Whoop · 2025-05-13T06:13:52Z

Big-Whoop
May 13, 2025
Author

The --cache-none in combination with --disable-smart-memory kinda does what I want. I don't know yet if that has any unwanted side effects.

However I would strongly suggest that the Comfy team takes a look at the default settings again.

The argument is:

Every modern OS (I tested Windows 11, WSL on WIndows 11 and CachyOS Linux) will cache files loaded from disk in RAM anyway and keep them there as long as possible and replace them with later loaded files as soon as the buffer space is exhausted. This is a smart memory management on Operating System Level already. If Comfy does the same it takes twice the amount of resources and inevitably causes swapping on low RAM systems. And the offloading from VRAM to RAM takes unneccessary additional CPU / GPU Resources on top of that. This does not make any sense and the defaults should be changed accordingly.

On WSL the effect is even more wasteful since the Host WIndows, the guest Linux and Comfy will cache the files. Even my 256 BG RAM main RIG quickly exhausts its resources under those circumstances.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disabling VRAM <-> RAM offloading without --gpu-only #6659

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Disabling VRAM <-> RAM offloading without --gpu-only #6659

Uh oh!

Big-Whoop Jan 31, 2025

Replies: 2 comments · 2 replies

Uh oh!

ltdrdata Jan 31, 2025 Collaborator

Uh oh!

Big-Whoop Jan 31, 2025 Author

Uh oh!

Ribonz May 3, 2025

Uh oh!

Uh oh!

Big-Whoop May 13, 2025 Author

Big-Whoop
Jan 31, 2025

Replies: 2 comments 2 replies

ltdrdata
Jan 31, 2025
Collaborator

Big-Whoop Jan 31, 2025
Author

Big-Whoop
May 13, 2025
Author