Skip to content

Commit 18495f4

Browse files
Angazennangazenn
andauthored
[BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636)
### What this PR does / why we need it? This PR fixes a bug that is caused by max_num_tokens_across_dp calculation. In earlier version, we compute this by graph_pad_size plus max_num_tokens(actual). This will result in different max_num_tokens_across_dp across dp ranks. If padding related is required, this might cause a wrong padding. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed normally. Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>
1 parent 9c886d0 commit 18495f4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm_ascend/attention/attention_v1_torchair.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -273,10 +273,10 @@ def build(self,
273273
if use_torchair_graph and self.runner.attn_state in [
274274
AscendAttentionState.DecodeOnly,
275275
]:
276-
max_num_tokens_across_dp += graph_pad_size
277276
pad_value = 1
278277
padded_seq_lens = seq_lens.tolist() + [pad_value
279278
] * graph_pad_size
279+
max_num_tokens_across_dp = len(padded_seq_lens)
280280

281281
seq_lens = torch.from_numpy(
282282
np.array(padded_seq_lens).astype(np.int32))

0 commit comments

Comments
 (0)