Skip to content

Commit 05a4710

Browse files
authored
bugfix for qwen2_5_vl (#805)
### What this PR does / why we need it? the interface of qwen2.5vl changes from column linear to qkv linear, this makes our weight pad func become abnormal, thus we optimize split_qkv func to fix this bug. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? with CI Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
1 parent a93bed4 commit 05a4710

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

vllm_ascend/models/qwen2_5_vl.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,19 @@ def __init__(
7070
if self.hidden_size_per_attention_head > MIN_PAD_SIZE and self.hidden_size_per_attention_head < MAX_PAD_SIZE:
7171
self.hidden_size_per_attention_head = MAX_PAD_SIZE
7272

73+
def split_qkv(self, qkv: torch.Tensor) -> tuple[torch.Tensor, ...]:
74+
# [s, b, 3 * head * head_dim]
75+
seq_len, bs, _ = qkv.shape
76+
77+
# [s, b, 3 * head * head_dim] -> 3 * [s, b, head * head_dim]
78+
q, k, v = qkv.chunk(3, dim=2)
79+
80+
# 3 * [s, b, head * head_dim] -> 3 * [s, b, head, head_dim]
81+
new_shape = (seq_len, bs, self.num_attention_heads_per_partition,
82+
self.hidden_size_per_attention_head)
83+
q, k, v = (x.view(*new_shape) for x in (q, k, v))
84+
return q, k, v
85+
7386
def forward(
7487
self,
7588
x: torch.Tensor,

0 commit comments

Comments
 (0)