Offloading feature #278
eeditor1055
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Few days ago I saw a post on reddit, where people offload LLM GGUF not full layer by full layer to GPU, but certain parts of layers to GPU.
They claim that It boosts speed of generation.
Can this be done with converted SD (SD1.5, SDXL, FLUX, etc..) GGUFs?
Here's a link to the post:
https://www.reddit.com/r/LocalLLaMA/comments/1ki7tg7/dont_offload_gguf_layers_offload_tensors_200_gen/
Beta Was this translation helpful? Give feedback.
All reactions