Replies: 1 comment 1 reply
-
any performance numberos? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I made a little fork of Llama.cpp mainline, integrating some commits of IK_Llama, and able to quantize (for now) in q6_0, IQ3_K, IQ4_K, IQ5_K and IQ6_K.
It's based on b5474 for now, and now I can use the wonderful q6_0 and IQ6_K for any model supported by mainline.
Here's the first alpha : https://github.com/Nexesenex/croco.cpp/releases/tag/v0.01
Edit : https://github.com/Nexesenex/croco.cpp/releases/tag/NXS_v0.04_b5525
Edit 2 : https://github.com/Nexesenex/croco.cpp/releases/tag/v1.93040_b5600_RMv1.11.8 (with NXS_Llama_v0.13_b5600), an attempt to make work the R4 quants supported on Cuda.
Beta Was this translation helpful? Give feedback.
All reactions