-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Greetings Razvan,
Thank you and your colleagues for the paper Layer-wise Quantization: A pragmatic and Effictive Method for Quantizing LLMs Beyond Integer Bit-levels.
I have two questions:
1
I'm curious if you still planned on releasing any example implementations of computing Layer Input Modification (LIM) Score and Z-score Distribution (ZD) on LLMs such as the presented dense model Llama-2-13B
? Also wondering if there was any more recent exploration around newer MoE models such as DeepSeek-R1
/-V3-0324
?
Bonus if the code works on existing q8_0
GGUF quants haha...
No pressure if you've already moved on, I have some interest in evaluating these methods given it is becoming easier to specify per layer quantization schemes e.g. ik_llama.cpp's new llama-quantize --custom-q
feature.
2
Our first score, named layer input modification (LIM), is based on how much a layer changes its input representations into the output ones.
Just confirming that I understand correctly that both the LIM and ZD scores are calculated taking all tensors of a given layer into account (e.g. attn_(v|q|k|output)
and ffn_(up|down|gate)
) and not just individual tensors?
Thanks!