Is llama 3 more prone to damage from quantization? #6901
Replies: 3 comments 2 replies
-
Llama3 would need the fixed quantizer to be calibrated, we don't know if the GQA or new tokenizer causes these results. The fixed quantizations gives these results on wiki-test-raw: llama3 8b (10.7%) (+0.4)F16 Q4_0 Q8_0 Mistral 7b (2.6% difference) (+0.2)F16 Q4_0 Q8_0 |
Beta Was this translation helpful? Give feedback.
-
Very relevant here. |
Beta Was this translation helpful? Give feedback.
-
For those interested, I did the PPL on all quants using the base model.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So I've made a small test. In a full 4096 context, I've written an elaborate instruct in my system prompt how my character should act in certain situations which require quite a bit of logic.
With Llama 3 8B Instruct at FP16, the model was successfully able to connect the dots. Then I did the same test using the same sampler settings with a quantized IQ4_XS model of Llama 3 8B Instruct and it failed all the time.
That also applied to 70B. A quantized 70B was unable to perform this test correctly most of the time, while the FP16 model of 8B's success-rate was much higher. I feel like quantization significantly reduces the attention to early parts of the context which includes the system prompt.
With other small models like Mistral 7B and Solar, I didn't notice this severe damage at all. A Solar model at IQ4_XS does a better job than LLama 3 8B Instruct IQ4_XS in this test.
Has anyone noticed the same? I came across this reddit thread https://www.reddit.com/r/LocalLLaMA/comments/1cci5w6/quantizing_llama_3_8b_seems_more_harmful_compared/
which seems to confirm my suspicion.
Has anyone else noticed this or made similar tests? I'm also wondering if the bf16->quant conversion is partly to blame for this and if LLama 3 perhaps suffers from this more than other models.
Beta Was this translation helpful? Give feedback.
All reactions