Quantization benchmark charts for 70B or higher #6330

Sumandora · 2024-03-26T20:55:59Z

Sumandora
Mar 26, 2024

I have seen this post: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
@Artefact2 posted a chart there which benchmarks each quantization on Mistral-7B, however I would be interested in the same chart for a bigger model, 70B in specific. Somebody else asked this in a comment on the gist, however Artefact2 is incapable of performing the same test on a bigger model due to hardware constraints.

Even though Artefact2 expects these charts to look similar I'm still interested in them, because in my experience running a Q2 of a 70B/120B is a much smoother experience than running Mistral at Q2. While Q2 on a 30B (and partially also 70B) model breaks large parts of the model, the bigger models still seem to retain most of their quality. I assume this is because more information is preserved, like how parameters carry an information and having more parameters means carrying more information.

I haven't seen anyone else post a chart, so if there is one or if somebody could make one I'd greatly appreciate it as my 2060 6GB probably already suffers from PTSD after running 70B+ on it (though the CPU did most of the heavy lifting). Either way I think about buying something better and it would be nice to know if the bigger models also require Q5 or higher to reach performance comparable to f16 or if Q3 is enough for these bigger models to function.

Thanks in advance ^^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantization benchmark charts for 70B or higher #6330

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Quantization benchmark charts for 70B or higher #6330

Uh oh!

Sumandora Mar 26, 2024

Replies: 0 comments

Sumandora
Mar 26, 2024