You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bugfix] infer_quantization_format when model only has activation quantization (#1635)
## Purpose ##
* Fix KV cache tests, whose models only have activation quantization
## Background
Previously, `is_model_quantized` would only check for quantization on
leaf modules. Now it checks on attention modules as well, but since we
have examples of attention modules with only activation quantization,
this triggers a bug in `infer_quantization_format`
## Testing ##
* Requires neuralmagic/compressed-tensors#387 to
pass KV cache tests
---------
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
0 commit comments