Skip to content

Commit 9acc082

Browse files
authored
[BugFix] Fix accuray bug of prefix-caching. (#1492)
When use AscendScheduler with prefix-cache enabled and chunk-prefill disabled, there will be accuray problem because there is no branch in mla_v1 to process this scenario. This PR fixes it. Signed-off-by: whx-sjtu <2952154980@qq.com>
1 parent 45e33e4 commit 9acc082

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

vllm_ascend/attention/mla_v1.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -751,7 +751,8 @@ def _forward_prefill(
751751

752752
if attn_metadata.attn_state in [
753753
AscendAttentionState.ChunkedPrefill,
754-
AscendAttentionState.SpecDecoding
754+
AscendAttentionState.SpecDecoding,
755+
AscendAttentionState.PrefillCacheHit
755756
] and not ascend_config.chunked_prefill_for_mla:
756757
attn_output_torch = torch.empty(num_tokens,
757758
self.num_heads * self.v_head_dim,
@@ -776,7 +777,8 @@ def _forward_prefill(
776777
causal=True)
777778
elif attn_metadata.attn_state in [
778779
AscendAttentionState.ChunkedPrefill,
779-
AscendAttentionState.SpecDecoding
780+
AscendAttentionState.SpecDecoding,
781+
AscendAttentionState.PrefillCacheHit
780782
]:
781783
attn_lse = torch.empty(self.num_heads,
782784
num_tokens,
@@ -830,7 +832,8 @@ def _forward_prefill(
830832
[num_tokens, self.num_heads * self.v_head_dim])
831833
if attn_metadata.attn_state in [
832834
AscendAttentionState.ChunkedPrefill,
833-
AscendAttentionState.SpecDecoding
835+
AscendAttentionState.SpecDecoding,
836+
AscendAttentionState.PrefillCacheHit
834837
] and not ascend_config.chunked_prefill_for_mla:
835838
attn_output = attn_output_torch
836839

0 commit comments

Comments
 (0)