Skip to content

Commit 105d2df

Browse files
authored
[v0.9.1][Fix] Fix block table shape (#1297)
### What this PR does / why we need it? This fix the shape of block_table which was introduced by hybrid kv groups several weeks ago. Error will be raised when enable prefix-cache (eager or not) and Ascend Scheduler at the same time, just send two identical requests and it will reproduce. Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
1 parent 6856f9d commit 105d2df

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm_ascend/attention/attention_v1.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -307,11 +307,13 @@ def forward(
307307
assert attn_metadata is not None
308308
assert attn_metadata.attn_mask is not None
309309
compress_mask = attn_metadata.attn_mask
310+
batch_size = attn_metadata.query_lens.shape[0]
311+
block_table = attn_metadata.block_tables[:batch_size, :]
310312
torch_npu._npu_flash_attention_qlens(
311313
query=query,
312314
key_cache=self.key_cache,
313315
value_cache=self.value_cache,
314-
block_table=attn_metadata.block_tables,
316+
block_table=block_table,
315317
mask=compress_mask,
316318
seq_len=attn_metadata.query_lens,
317319
context_lens=attn_metadata.seq_lens,

0 commit comments

Comments
 (0)