-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Closed
Labels
feature requestNew feature or requestNew feature or request
Description
Motivation
AWQ is nice, but if you want more control over the bit depth (thus VRAM usage), then GGUF may be a better option. A wide range of models are available from TheBloke at various bit depths, so everybody can use the biggest one which can fit into their GPUs.
I cannot find a high-throughput batch inference engine which can load GGUF, maybe there is none. (vLLM cannot load it either.)
Related resources
sprezz-arthur, JasmondL, wangcx18, acbp, wzp123123 and 50 more
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request