This repository was archived by the owner on Jun 24, 2024. It is now read-only.
Run (New) 5-bit Quantized Models #239
Closed
PeytonCleveland
started this conversation in
Ideas
Replies: 2 comments 2 replies
-
Hey, thanks for writing in! Glad to hear you're enjoying it :) Q5_1 QNT1 (the new format) should already work with the latest |
Beta Was this translation helpful? Give feedback.
2 replies
-
The 5-bit models should all be working at present - I've been using Q5_1 in "production" deployments with no issues. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
First off, wanted to say thanks to everyone thats worked on this project! I'm relatively new to Rust, I come from a Node background, but I've been able to create a Warp server around llm without much hassle.
One thing I wanted to ask, is there any plan to support 5-bit GGML models with llm? With the recent, breaking changes to GGML: ggml-org/ggml#154, both 4-bit and 5-bit formats have changed. I'd like to be able to run these newer formats, specifically 5_1, as perplexity seems to be almost equal to F16 without too much of a hit to size and inference speed: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
Beta Was this translation helpful? Give feedback.
All reactions