Skip to content

add qwen3-moe optimization #1441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

shiyuan680
Copy link

@shiyuan680 shiyuan680 commented Jun 26, 2025

What this PR does / why we need it?

origin qwen3_moe loss alltoall operation which result fault resultt, in this pr reuse some optimizations from deepseek.

Does this PR introduce any user-facing change?

How was this patch tested?

test in 235b
parallelism tps open
dp16tp2ep32 160 close
dp16tp2ep32 192 on
dp8tp4ep32 76 close
dp8tp4ep32 128 on

Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update commits msg more meanful, such as mention what kind of change apply compare to upstream impementation and performance test results

@@ -35,6 +35,7 @@
MODELS = [
"Qwen/Qwen2.5-0.5B-Instruct",
"Qwen/Qwen3-0.6B-Base",
"Qwen/Qwen3-30B-A3B",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is too huge and will cost lots of time to run ci, please try to used reduce layer model: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html#e2e-test-example

@@ -33,3 +57,89 @@ class CustomQwen3MoeForCausalLM(Qwen3MoeForCausalLM):
"experts":
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
}


class AscendQwen3MoeSparseMoeBlock(nn.Module):
Copy link
Collaborator

@Yikun Yikun Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shiyuan680 shiyuan680 force-pushed the qwen3 branch 9 times, most recently from cc28837 to 0461ef2 Compare June 27, 2025 02:30
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
@shiyuan680 shiyuan680 force-pushed the qwen3 branch 4 times, most recently from e9113e2 to 5d21f95 Compare June 28, 2025 01:43
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
Copy link

codecov bot commented Jul 1, 2025

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 34.14%. Comparing base (c30ddb8) to head (a187b72).
Report is 88 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/fused_moe.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1441      +/-   ##
==========================================
+ Coverage   27.39%   34.14%   +6.75%     
==========================================
  Files          56       63       +7     
  Lines        6191     7315    +1124     
==========================================
+ Hits         1696     2498     +802     
- Misses       4495     4817     +322     
Flag Coverage Δ
unittests 34.14% <0.00%> (+6.75%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leograssroot
Copy link

dp切分或者dp+tp切分场景下,执行客户的其他moe模型也会有精度问题,会有一句话不断重复回答的问题

@shiyuan680
Copy link
Author

dp切分或者dp+tp切分场景下,执行客户的其他moe模型也会有精度问题,会有一句话不断重复回答的问题

是qwen3-moe模型吗,这个只是针对qwen3-moe模型的修复

@leograssroot
Copy link

dp切分或者dp+tp切分场景下,执行客户的其他moe模型也会有精度问题,会有一句话不断重复回答的问题

是qwen3-moe模型吗,这个只是针对qwen3-moe模型的修复

不是qwen3,这个切分不是通用的问题么

@leograssroot
Copy link

dp切分或者dp+tp切分场景下,执行客户的其他moe模型也会有精度问题,会有一句话不断重复回答的问题

是qwen3-moe模型吗,这个只是针对qwen3-moe模型的修复

#1597

Copy link

github-actions bot commented Jul 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants