EXL2 quantization error / fine-tuning? #70
Unanswered
SinanAkkoyun
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey!
When quantizing, a quantization error occurs, which is being minimized by the use of calibration data. When quantizing a full-precision model, this is the only option.
However, when fine-tuning with qLora, one can (I think) export the quantized model directly.
Doing this "Quantization aware training" while fine-tuning seems to promise more accurate fine-tuned models than calibrating afterwards.
Let's say one fine-tunes a model with GPTQ qLora and directly exports the quantized model. Would that model perform better than one that got the whole fine-tuning dataset as calibration data only?
I am just very interested in the highest reliability of outputs from quantized models, as I hope that with the right optimization, the error could potentially be eliminated, but I'd like to get external expert optinions on that! :)
Beta Was this translation helpful? Give feedback.
All reactions