Skip to content

Commit d1f650a

Browse files
committed
fix: fix deepseek accuracy when ep_size=1
Signed-off-by: zzzzwwjj <1183291235@qq.com>
1 parent a90b784 commit d1f650a

File tree

2 files changed

+2
-0
lines changed

2 files changed

+2
-0
lines changed

vllm_ascend/ops/fused_moe.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,7 @@ def fused_experts(
189189
num_experts = w1.shape[0]
190190
dtype = hidden_states.dtype
191191
device = hidden_states.device
192+
topk_weights = topk_weights.to(dtype)
192193
# assert dtype in [torch.float32, torch.float16, torch.bfloat16
193194
# ], "Only float32, float16, and bfloat16 are supported"
194195

vllm_ascend/quantization/w8a8_dynamic.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,7 @@ def fused_experts(hidden_states: torch.Tensor,
218218
num_experts = w1.shape[0]
219219
dtype = hidden_states.dtype
220220
device = hidden_states.device
221+
topk_weights = topk_weights.to(dtype)
221222

222223
if expert_map is not None:
223224
# Generate token indices and flatten

0 commit comments

Comments
 (0)