A curiosity. #466

Nexesenex · 2025-05-28T03:22:15Z

Nexesenex
May 28, 2025

I made a little fork of Llama.cpp mainline, integrating some commits of IK_Llama, and able to quantize (for now) in q6_0, IQ3_K, IQ4_K, IQ5_K and IQ6_K.
It's based on b5474 for now, and now I can use the wonderful q6_0 and IQ6_K for any model supported by mainline.
Here's the first alpha : https://github.com/Nexesenex/croco.cpp/releases/tag/v0.01

Edit : https://github.com/Nexesenex/croco.cpp/releases/tag/NXS_v0.04_b5525

Edit 2 : https://github.com/Nexesenex/croco.cpp/releases/tag/v1.93040_b5600_RMv1.11.8 (with NXS_Llama_v0.13_b5600), an attempt to make work the R4 quants supported on Cuda.

VinnyG9 · 2025-05-28T20:14:51Z

VinnyG9
May 28, 2025

any performance numberos?

1 reply

Nexesenex May 29, 2025
Author

None, it barely works for a part of its purpose, which is to quantize models with some IQ quants within the mainline framework.
PPL test work also, as well as Cuda inference for Gemma 3 in 0.04. And that's it for now. ^^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A curiosity. #466

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A curiosity. #466

Uh oh!

Uh oh!

Nexesenex May 28, 2025

Replies: 1 comment · 1 reply

Uh oh!

VinnyG9 May 28, 2025

Uh oh!

Nexesenex May 29, 2025 Author

Nexesenex
May 28, 2025

Replies: 1 comment 1 reply

VinnyG9
May 28, 2025

Nexesenex May 29, 2025
Author