rm router logits Improve TTOP 3ms #1407

ttanzhiqiang · 2025-06-24T13:54:23Z

What this PR does / why we need it?

The previous code is
router_logits, _ = self.gate(hidden_states)
hidden_states = get_dp_group().all_gather(hidden_states, 0)
router_logits = get_dp_group().all_gather(router_logits, 0)
I want to change the two all_gathers to one, reduce one all_gather communication, and make it
hidden_states = get_dp_group().all_gather(hidden_states, 0)
router_logits, _ = self.gate(hidden_states)

Does this PR introduce any user-facing change?

How was this patch tested?

bash examples/run_dp_attention_etp16.sh
bash examples/run_dp_attention_etp16_benmark.sh

gsm8k accuracy verification

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@77f77a9

Signed-off-by: ttanzhiqiang <389825161@qq.com>

vllm_ascend/ops/fused_moe.py

Signed-off-by: ttanzhiqiang <389825161@qq.com>

codecov · 2025-06-25T04:09:37Z

Codecov Report

Attention: Patch coverage is 5.00000% with 19 lines in your changes missing coverage. Please review.

Project coverage is 54.49%. Comparing base (c30ddb8) to head (af900cc).
Report is 109 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/fused_moe.py	0.00%	9 Missing ⚠️
vllm_ascend/utils.py	14.28%	6 Missing ⚠️
vllm_ascend/models/deepseek_v2.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1407       +/-   ##
===========================================
+ Coverage   27.39%   54.49%   +27.10%     
===========================================
  Files          56       80       +24     
  Lines        6191     9984     +3793     
===========================================
+ Hits         1696     5441     +3745     
- Misses       4495     4543       +48

Flag	Coverage Δ
unittests	`54.49% <5.00%> (+27.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-06-25T12:13:11Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-07-02T06:36:16Z

vllm_ascend/envs.py

@@ -121,6 +121,9 @@
    # value to False to disable the optimized model.
    "USE_OPTIMIZED_MODEL":
    lambda: bool(int(os.getenv('USE_OPTIMIZED_MODEL', '1'))),
+    # Remove the two communications of get_dp_group().all_gather and change it to one, and do gate after the communication
+    "VLLM_ASCEND_RM_ROUTER_LOGITS":
+    lambda: int(os.getenv("VLLM_ASCEND_RM_ROUTER_LOGITS", 0)),


from Q3, we'll be careful to add more configuration. please remove it to enable rm_router_logits by default.

This is only valid in the FusedMoEState.AllGather solution. If other models use gate externally and rm_router_logits internally, an error will be reported, such as deepseek_dbo/qwen3/qwen2

Agree, could we enable this in some cases automatically, because it's difficult to let users know which models should enable this env or not.

Otherwise, LGTM

In theory, this solution is only applicable to AllGather and AllGatherEP, because in the dp scenario, the previous operation was gate + two communications, and now it is changed to one communication + gate operation, which can save some communication time. In theory, all moe AllGather and AllGatherEP solutions can follow this logic, but now other moe models (qwen3-235b) dp solutions are not adjusted, so use the switch to control it to prevent code errors.

If it's not common, I prefer not to merge, we can wait more.

Or, if we can add more logic check instead of env var, i'm fine as well.

ok, currently m is enabled by default in the AllGather, AllGatherEP and NaiveMulticast scenarios of the deepseek model. rm_router_logits is not enabled in other scenarios and models. You can add it later if necessary.

github-actions · 2025-07-07T14:40:22Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

ttanzhiqiang · 2025-07-08T15:14:55Z

update @wangxiyuan @Yikun

github-actions · 2025-07-09T00:54:37Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions · 2025-07-10T04:15:59Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: ttanzhiqiang <389825161@qq.com>

rm router logits Improve TTOP 3ms

72aeb69

Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions bot added module:ops module:core labels Jun 24, 2025

update

04ad4c2

Signed-off-by: ttanzhiqiang <389825161@qq.com>

ApsarasX reviewed Jun 24, 2025

View reviewed changes

vllm_ascend/ops/fused_moe.py Outdated Show resolved Hide resolved

update

f13442e

Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions bot added the merge-conflicts label Jun 25, 2025

Merge branch 'main' into rm_router_logits

db520cd

github-actions bot removed the merge-conflicts label Jun 25, 2025

ApsarasX added the ready read for review label Jul 1, 2025

wangxiyuan reviewed Jul 2, 2025

View reviewed changes

github-actions bot added merge-conflicts and removed ready read for review labels Jul 7, 2025

Merge branch 'main' into rm_router_logits

4c8954a

github-actions bot removed the merge-conflicts label Jul 8, 2025

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

86df0a2

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions bot removed the module:core label Jul 8, 2025

ttanzhiqiang added 7 commits July 8, 2025 12:24

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

2f77bc9

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

6f18307

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

cb15e05

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

d8755c9

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

e0c36a8

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

9e15f42

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

a595a67

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions bot added the module:core label Jul 8, 2025

github-actions bot added the merge-conflicts label Jul 9, 2025

Merge branch 'main' into rm_router_logits

eedcd05

github-actions bot removed the merge-conflicts label Jul 9, 2025

ttanzhiqiang added 3 commits July 9, 2025 10:57

deepseekv3/r1 support rm_router_logits in [AllGatherEP, AllGather, Na…

fa50f6a

…iveMulticast] Signed-off-by: ttanzhiqiang <389825161@qq.com>

Empty submission

e4fc29f

Signed-off-by: ttanzhiqiang <389825161@qq.com>

Empty submission

a0be155

Signed-off-by: ttanzhiqiang <389825161@qq.com>

wangxiyuan approved these changes Jul 10, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Jul 10, 2025

Merge branch 'main' into rm_router_logits

89458f0

github-actions bot removed the merge-conflicts label Jul 10, 2025

update

af900cc

Signed-off-by: ttanzhiqiang <389825161@qq.com>

wangxiyuan merged commit 9d16c99 into vllm-project:main Jul 11, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rm router logits Improve TTOP 3ms #1407

rm router logits Improve TTOP 3ms #1407

Uh oh!

ttanzhiqiang commented Jun 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

codecov bot commented Jun 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

wangxiyuan Jul 2, 2025

Uh oh!

ttanzhiqiang Jul 4, 2025

Uh oh!

Yikun Jul 6, 2025 •

edited

Loading

Uh oh!

ttanzhiqiang Jul 7, 2025

Uh oh!

wangxiyuan Jul 7, 2025

Uh oh!

ttanzhiqiang Jul 8, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

ttanzhiqiang commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

rm router logits Improve TTOP 3ms #1407

rm router logits Improve TTOP 3ms #1407

Uh oh!

Conversation

ttanzhiqiang commented Jun 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

codecov bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

wangxiyuan Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

ttanzhiqiang Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttanzhiqiang Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

ttanzhiqiang Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

ttanzhiqiang commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

ttanzhiqiang commented Jun 24, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jun 25, 2025 •

edited

Loading

Yikun Jul 6, 2025 •

edited

Loading