add qwen3-moe optimization #1441

shiyuan680 · 2025-06-26T02:35:53Z

What this PR does / why we need it?

origin qwen3_moe loss alltoall operation which result fault resultt, in this pr reuse some optimizations from deepseek.

Does this PR introduce any user-facing change?

How was this patch tested?

test in 235b
parallelism tps open
dp16tp2ep32 160 close
dp16tp2ep32 192 on
dp8tp4ep32 76 close
dp8tp4ep32 128 on

Yikun

Please update commits msg more meanful, such as mention what kind of change apply compare to upstream impementation and performance test results

Yikun · 2025-06-26T10:15:51Z

tests/e2e/singlecard/test_offline_inference.py

@@ -35,6 +35,7 @@
 MODELS = [
    "Qwen/Qwen2.5-0.5B-Instruct",
    "Qwen/Qwen3-0.6B-Base",
+    "Qwen/Qwen3-30B-A3B",


This file is too huge and will cost lots of time to run ci, please try to used reduce layer model: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html#e2e-test-example

Yikun · 2025-06-26T10:16:05Z

vllm_ascend/models/qwen3_moe.py

@@ -33,3 +57,89 @@ class CustomQwen3MoeForCausalLM(Qwen3MoeForCausalLM):
        "experts":
        ["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
    }
+
+
+class AscendQwen3MoeSparseMoeBlock(nn.Module):


Please add ut for this: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html

Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>

codecov · 2025-07-01T10:07:24Z

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 34.14%. Comparing base (c30ddb8) to head (a187b72).
Report is 88 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/fused_moe.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1441      +/-   ##
==========================================
+ Coverage   27.39%   34.14%   +6.75%     
==========================================
  Files          56       63       +7     
  Lines        6191     7315    +1124     
==========================================
+ Hits         1696     2498     +802     
- Misses       4495     4817     +322

Flag	Coverage Δ
unittests	`34.14% <0.00%> (+6.75%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

leograssroot · 2025-07-02T11:49:31Z

dp切分或者dp+tp切分场景下，执行客户的其他moe模型也会有精度问题，会有一句话不断重复回答的问题

shiyuan680 · 2025-07-03T01:34:40Z

dp切分或者dp+tp切分场景下，执行客户的其他moe模型也会有精度问题，会有一句话不断重复回答的问题

是qwen3-moe模型吗，这个只是针对qwen3-moe模型的修复

leograssroot · 2025-07-03T02:25:17Z

dp切分或者dp+tp切分场景下，执行客户的其他moe模型也会有精度问题，会有一句话不断重复回答的问题

是qwen3-moe模型吗，这个只是针对qwen3-moe模型的修复

不是qwen3，这个切分不是通用的问题么

leograssroot · 2025-07-03T03:16:00Z

dp切分或者dp+tp切分场景下，执行客户的其他moe模型也会有精度问题，会有一句话不断重复回答的问题

是qwen3-moe模型吗，这个只是针对qwen3-moe模型的修复

#1597

github-actions · 2025-07-07T14:40:23Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan mentioned this pull request Jun 26, 2025

[Bug]: Qwen3-30B-A3B Shows Precision Issues in DP2+TP2 Parallel Mode #1289

Open

github-actions bot added module:tests module:ops labels Jun 26, 2025

shiyuan680 force-pushed the qwen3 branch from 150a577 to d945298 Compare June 26, 2025 07:22

Yikun reviewed Jun 26, 2025

View reviewed changes

shiyuan680 force-pushed the qwen3 branch 9 times, most recently from cc28837 to 0461ef2 Compare June 27, 2025 02:30

add qwen3-moe optimization

103bb69

Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>

shiyuan680 force-pushed the qwen3 branch 4 times, most recently from e9113e2 to 5d21f95 Compare June 28, 2025 01:43

fix mc2 bug

8b296a8

Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>

shiyuan680 force-pushed the qwen3 branch from 5d21f95 to 8b296a8 Compare June 28, 2025 02:29

github-actions bot added module:tests and removed module:tests labels Jun 28, 2025

shiyuan680 force-pushed the qwen3 branch 7 times, most recently from 83ee4c1 to cfc68cc Compare July 1, 2025 08:37

add st

a187b72

Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>

shiyuan680 force-pushed the qwen3 branch from cfc68cc to a187b72 Compare July 1, 2025 09:26

github-actions bot added the merge-conflicts label Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add qwen3-moe optimization #1441

add qwen3-moe optimization #1441

shiyuan680 commented Jun 26, 2025 •

edited

Loading

Uh oh!

Yikun left a comment

Uh oh!

Yikun Jun 26, 2025

Uh oh!

Yikun Jun 26, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

leograssroot commented Jul 2, 2025

Uh oh!

shiyuan680 commented Jul 3, 2025

Uh oh!

leograssroot commented Jul 3, 2025

Uh oh!

leograssroot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

Uh oh!

add qwen3-moe optimization #1441

Are you sure you want to change the base?

add qwen3-moe optimization #1441

Conversation

shiyuan680 commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

Yikun Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

leograssroot commented Jul 2, 2025

Uh oh!

shiyuan680 commented Jul 3, 2025

Uh oh!

leograssroot commented Jul 3, 2025

Uh oh!

leograssroot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

Uh oh!

shiyuan680 commented Jun 26, 2025 •

edited

Loading

Yikun Jun 26, 2025 •

edited

Loading

codecov bot commented Jul 1, 2025 •

edited

Loading