Replies: 3 comments 5 replies
-
I have also opened a PR about how Mistral quantization is potentially lacking an optimization for GQA. |
Beta Was this translation helpful? Give feedback.
-
Layer skipping was an interesting experiment that was considered inconclusive. Maybe having a more consistent overall datapoint on how the model changes in response to layer skipping would be ideal for resuming it. In fact, it might be interesting to see what KL probabilities changed the most compared to the least (if we test across wide texts); it might be a step towards interpretability for hidden layers. |
Beta Was this translation helpful? Give feedback.
-
In your scaled divergence chart, if you started with the Q2_K model then wouldn't the full quality model show up as the highest divergence? So it just tells you there's a difference, not really whether the difference is good or bad. Perplexity tells you how accurately the model predicted some standard text. By the way, did you already see #2875? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Perplexity is a very rough measurement for seeing how much quantization actually changes the final output of the model.
I propose using a metric that compares the changes of the percentages for the output tokens, since the similarity there seems to directly correlate with perceived quantization loss.
This may also be a beneficial metric for improving k-quant configurations. In any case, it seems much more reliable to take something like the top 5 k and compare the percentages.
Beta Was this translation helpful? Give feedback.
All reactions