Replies: 2 comments 1 reply
-
I would just use Q8 since it act just like fp16 anyway |
Beta Was this translation helpful? Give feedback.
-
You can use any model ( |
Beta Was this translation helpful? Give feedback.
-
I would just use Q8 since it act just like fp16 anyway |
Beta Was this translation helpful? Give feedback.
-
You can use any model ( |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I want to try some of the newest quants, like IQ2_XS(S) or IQ3_XS, but I'm not sure if my setup (32 RAM + 6 VRAM) is capable of performing all of the imatrix evaluation steps on something as big as an 70B model in a reasonable amount of time.
If it's not, I saw that the code for the quantize.cpp seems to allow to requantize "bigger" quants like 4_K_M into smaller ones, like 3_K_M ( I might be wrong though).
Is it possible to calculate the importance matrix using already quantized models, or I have to use fp16 files?
Beta Was this translation helpful? Give feedback.
All reactions