Train imatrix with model weights in 64Bit Precision #11072
Closed
joseph777111
started this conversation in
Ideas
Replies: 1 comment 5 replies
-
Unless the model was trained and saved at 64 bits it provides no value, even going bf16 -> f32 has no value (other than allowing GPU offloading during calculation, until we get CUDA bf16 support) |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This thought has been bugging me in the back of my mind: can we train imatrices with the model weights in 64bit precision? I know it's overkill, but is it possible? Training the imatrix with an F32 of the model yields superior results. In fact, on my M1 Mac, I can train the F32 imatrix with the --process-output flag set (for llama-matrix), and the model actually benefits from it. So, extrapolating from that, I imagine that training imatrices in 64bit would yield even better results, considering the fact that I run my GGUF IQuants in OF32.EF32.IQ8_0 (Output Tensor.Embedddings.QuantSize). So, I'm curious what an Imatrix and model computed and quantized respectively from 64Bit model weights would yield? Is this possible, or am I ranting like a mad man? 🤔
PS... Llama.cpp + Apple Metal + FA (Flash Attention) is AWESOME! ❤️
@ggerganov @bartowski1182 @ikawrakow
Beta Was this translation helpful? Give feedback.
All reactions