Skip to content

Commit abf1faa

Browse files
whx-sjtuhw_whx
andauthored
[ModelRunnerV1] Adapt kv_cache quant in v1. (#685)
set self.kv_cache_dtype to kv_cache_spec in model_runner_v1 in order to support kv_cache quant in v1 Signed-off-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com>
1 parent 2204e4d commit abf1faa

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -880,7 +880,7 @@ def get_kv_cache_spec(self) -> KVCacheSpec:
880880
block_size=block_size,
881881
num_kv_heads=attn_module.num_kv_heads,
882882
head_size=attn_module.head_size,
883-
dtype=attn_module.dtype)
883+
dtype=self.kv_cache_dtype)
884884
elif attn_module.attn_type in (AttentionType.ENCODER,
885885
AttentionType.ENCODER_ONLY):
886886
# encoder-only attention does not need KV cache.

0 commit comments

Comments
 (0)