-
Notifications
You must be signed in to change notification settings - Fork 248
add qwen3-moe optimization #1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update commits msg more meanful, such as mention what kind of change apply compare to upstream impementation and performance test results
@@ -35,6 +35,7 @@ | |||
MODELS = [ | |||
"Qwen/Qwen2.5-0.5B-Instruct", | |||
"Qwen/Qwen3-0.6B-Base", | |||
"Qwen/Qwen3-30B-A3B", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is too huge and will cost lots of time to run ci, please try to used reduce layer model: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html#e2e-test-example
@@ -33,3 +57,89 @@ class CustomQwen3MoeForCausalLM(Qwen3MoeForCausalLM): | |||
"experts": | |||
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"], | |||
} | |||
|
|||
|
|||
class AscendQwen3MoeSparseMoeBlock(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add ut for this: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html
cc28837
to
0461ef2
Compare
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
e9113e2
to
5d21f95
Compare
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
83ee4c1
to
cfc68cc
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1441 +/- ##
==========================================
+ Coverage 27.39% 34.14% +6.75%
==========================================
Files 56 63 +7
Lines 6191 7315 +1124
==========================================
+ Hits 1696 2498 +802
- Misses 4495 4817 +322
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dp切分或者dp+tp切分场景下,执行客户的其他moe模型也会有精度问题,会有一句话不断重复回答的问题 |
是qwen3-moe模型吗,这个只是针对qwen3-moe模型的修复 |
不是qwen3,这个切分不是通用的问题么 |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
origin qwen3_moe loss alltoall operation which result fault resultt, in this pr reuse some optimizations from deepseek.Does this PR introduce any user-facing change?
How was this patch tested?
test in 235b
parallelism tps open
dp16tp2ep32 160 close
dp16tp2ep32 192 on
dp8tp4ep32 76 close
dp8tp4ep32 128 on