Offloading feature #278

eeditor1055 · 2025-05-14T09:01:04Z

eeditor1055
May 14, 2025

Few days ago I saw a post on reddit, where people offload LLM GGUF not full layer by full layer to GPU, but certain parts of layers to GPU.
They claim that It boosts speed of generation.

Can this be done with converted SD (SD1.5, SDXL, FLUX, etc..) GGUFs?

Here's a link to the post:
https://www.reddit.com/r/LocalLLaMA/comments/1ki7tg7/dont_offload_gguf_layers_offload_tensors_200_gen/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Offloading feature #278

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Offloading feature #278

Uh oh!

eeditor1055 May 14, 2025

Replies: 0 comments

eeditor1055
May 14, 2025