Skip to content

Commit 557ee9e

Browse files
fix flex attention warning
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
1 parent 7f0d422 commit 557ee9e

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

vllm/v1/attention/backends/flex_attention.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -301,9 +301,8 @@ def build(self,
301301
block_table_tensor, self.cache_config.num_gpu_blocks)
302302

303303
# Get the original offset tensor
304-
offset_tensor = torch.tensor(
305-
common_attn_metadata.num_computed_tokens_cpu[:num_reqs]).to(
306-
self.device, non_blocking=True)
304+
offset_tensor = common_attn_metadata.num_computed_tokens_cpu.to(
305+
self.device, non_blocking=True)
307306

308307
out = FlexAttentionMetadata(
309308
num_actual_tokens=num_actual_tokens,

0 commit comments

Comments
 (0)