How to quantize the KV cache ? #9826

Abhranta · 2024-10-10T21:23:45Z

Abhranta
Oct 10, 2024

I want to quantize the KV cache and use it for Q4_0_4_$ type of moedls because these quant types don't need dequant and it might make it faster if the kv cache is also quantized to 4 bit. I cannot figure out how to quantize the kv cache though!!!!!

Any help ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to quantize the KV cache ? #9826

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to quantize the KV cache ? #9826

Uh oh!

Abhranta Oct 10, 2024

Replies: 0 comments

Abhranta
Oct 10, 2024