perplexity output of LLAMA2 7B quantized by IQ3_S seems to be weired #6971
penghongbo
started this conversation in
General
Replies: 1 comment
-
This is possibly due to a tokenization bug which was introduced about a week ago: #7049 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am learing llama.cpp. When I did a test with perplexity for different quantized 7B model, I found most of the looks reasonable (the Final estimated PPL < 10). But the IQ3_S quantized model returns a value larger than 300. Is this correct? I used model llama-2-7b-chat and wiki.test.raw (4358 lines).
I am not sure whethether I need to use imatrix for IQ3_S as it is mentioned in #5866. Please advise. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions