Skip to content

Commit 2690697

Browse files
authored
[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 (#1416)
### What this PR does / why we need it? Reset all unused positions in `NPUModelRunner` to prevent out-of-bounds asserts in the `GatherV3` operator. Currently, in [`get_splitfuse_attn_mask`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/attention/attention.py#L124), the `position` tensor may contain values that exceed the dimensions of the attention mask, triggering a `GatherV3` boundary check failure. These invalid indices originate from stale “dirty” entries left over in `position` due to padding logic in the ACL graph. Specifically, in [`_process_reqs`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L989), the variable `num_input_tokens` is always greater than or equal to `total_num_scheduled_tokens`, so any positions not explicitly cleared from a previous batch will persist and cause this sporadic error. BTW, in the original vLLM implementation, masks are constructed internally using other args, so these lingering values do not surface. However, on the Ascend platform—where split-fuse attention requires externally supplied masks—these residual indices become critical and lead to this elusive, hard-to-reproduce failure. The fix is to explicitly reset or zero out all unused entries in the `position` tensor before passing it to `GatherV3`, ensuring that every index lies within the valid range of the attention mask. Closes: #1038 ### Does this PR introduce _any_ user-facing change? No Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
1 parent 06ccce1 commit 2690697

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -953,6 +953,7 @@ def _process_reqs(
953953
self.mrope_positions_cpu[:, :total_num_scheduled_tokens],
954954
non_blocking=True)
955955

956+
self.positions[total_num_scheduled_tokens:num_input_tokens].zero_()
956957
self.positions[:total_num_scheduled_tokens].copy_(
957958
self.positions_cpu[:total_num_scheduled_tokens], non_blocking=True)
958959
positions = self.positions[:num_input_tokens]

0 commit comments

Comments
 (0)