Run (New) 5-bit Quantized Models #239

PeytonCleveland · 2023-05-17T14:12:21Z

PeytonCleveland
May 17, 2023

First off, wanted to say thanks to everyone thats worked on this project! I'm relatively new to Rust, I come from a Node background, but I've been able to create a Warp server around llm without much hassle.

One thing I wanted to ask, is there any plan to support 5-bit GGML models with llm? With the recent, breaking changes to GGML: ggml-org/ggml#154, both 4-bit and 5-bit formats have changed. I'd like to be able to run these newer formats, specifically 5_1, as perplexity seems to be almost equal to F16 without too much of a hit to size and inference speed: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1

philpax · 2023-05-17T14:14:59Z

philpax
May 17, 2023
Maintainer

Hey, thanks for writing in! Glad to hear you're enjoying it :)

Q5_1 QNT1 (the new format) should already work with the latest main. Have you run into any issues while trying it out?

2 replies

PeytonCleveland May 17, 2023
Author

Looks like I can run old 5-bit models, but I've tried several newer format models and get: thread 'main' panicked at 'Failed to load model: invalid file format version 2', src/main.rs:29:27

One of the ones I've tried: https://huggingface.co/TheBloke/stable-vicuna-13B-GGML

philpax May 17, 2023
Maintainer

I just downloaded stable-vicuna-13B.ggml.q5_1.bin and can confirm it works on the latest main.

Super-important: the latest main is not the current release on crates.io. We're planning on cutting a new release soon, but have been trying to put out fires caused by the quantization changes.

If you're using it as a dependency, I'd suggest switching to a Git dependency on this repository for a bit, or waiting a few days and using an older model until then.

Sorry about the confusion!

philpax · 2023-06-18T21:53:12Z

philpax
Jun 18, 2023
Maintainer

The 5-bit models should all be working at present - I've been using Q5_1 in "production" deployments with no issues.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run (New) 5-bit Quantized Models #239

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Run (New) 5-bit Quantized Models #239

Uh oh!

PeytonCleveland May 17, 2023

Replies: 2 comments · 2 replies

Uh oh!

philpax May 17, 2023 Maintainer

Uh oh!

PeytonCleveland May 17, 2023 Author

Uh oh!

philpax May 17, 2023 Maintainer

Uh oh!

philpax Jun 18, 2023 Maintainer

PeytonCleveland
May 17, 2023

Replies: 2 comments 2 replies

philpax
May 17, 2023
Maintainer

PeytonCleveland May 17, 2023
Author

philpax May 17, 2023
Maintainer

philpax
Jun 18, 2023
Maintainer