[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

weijinqian0 · 2025-07-01T05:53:06Z

[Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo

Introduction

This PR introduces two key optimizations for MoE model performance:

Efficient Token Dispatcher:
- Implements an optimized alltoallv_seq token dispatcher (adopted from NVIDIA Megatron and Ascend MindSpeed)
- Significantly more efficient than current alltoall implementation when using token_permute/unpermute fusion
- Enable with: VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ=1
DBO Support for alltoallv_seq:
- Builds upon the alltoallv_seq dispatcher to support DBO (Dual Batch Overlap)
- Enables overlapping of alltoallv communication during the prefilling stage
- Enable with both:
  - VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ=1
  - VLLM_ASCEND_ENABLE_DBO=1

Performance Improvements

Testing on Qwen3-30B-A3B shows nearly 2x throughput improvement compared to the original alltoall implementation.

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions · 2025-07-03T01:29:02Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions · 2025-07-08T10:50:24Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

* ut test * liscense & fix dsk dbo. Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions · 2025-07-09T11:22:41Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions · 2025-07-09T16:59:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: duyangkai <duyangkai@huawei.com>

ganyi1996ppo · 2025-07-10T08:59:40Z

vllm_ascend/ascend_forward_context.py

+    elif envs_ascend.VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ:
+        # MC2 Dispatch/Combine performs better than alltoall_seq in decoding stage.
+        return FusedMoEState.All2AllSeq if (
+            ep_size < 16 or with_prefill) else FusedMoEState.MC2


Why there is this restriction ep_size <16 ?

MC2 Dispatch/Combine is still faster than alltoall_seq in decoding stage. so when ep_size >= 16, use MC2 for better performance.

github-actions · 2025-07-10T10:37:59Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: duyangkai <duyangkai@huawei.com>

weijinqian_v1 added 12 commits July 1, 2025 09:51

[Feature]Moe alltoallv communication optimization for unquantized RL …

7ff288e

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

6d7b5b4

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

6a8e1a9

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

4805c5a

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

d68ce07

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

0aff693

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

f6ab19e

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

a94c094

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

91570d8

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

e7c0d2d

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

47439e8

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

cf3f1c8

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added module:tests module:ops module:core labels Jul 1, 2025

weijinqian_v1 added 3 commits July 1, 2025 14:03

[Feature]Moe alltoallv communication optimization for unquantized RL …

a4126f3

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

807aaf0

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

6f6efc1

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added the merge-conflicts label Jul 3, 2025

handle conflict

305a0eb

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added merge-conflicts and removed merge-conflicts labels Jul 8, 2025

github-actions bot removed the merge-conflicts label Jul 8, 2025

weijinqian_v1 and others added 5 commits July 9, 2025 16:25

add st:qwen3

5411ed6

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

add st for moe token dispatcher

3f88769

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

fix bug

854c149

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

add st for moe token dispatcher

d0bd006

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

add moe_block: AscendSparseMoeBlock

49e9771

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

harygo22 added 6 commits July 9, 2025 16:28

revert

b02ad40

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

fix bug

66807e0

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

fix a bug

d24758e

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

fix a bug

d76c4fb

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

ut test

f883902

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

liscens & fix dsk dbo.

d5656f4

* ut test * liscense & fix dsk dbo. Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

weijinqian0 force-pushed the v0.9.1-dev branch from a878a6d to d5656f4 Compare July 9, 2025 08:28

weijinqian_v1 added 2 commits July 9, 2025 16:41

handle conflict

df52070

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle code clean

adf3f74

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added the merge-conflicts label Jul 9, 2025

handle code clean

5956ef0

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot removed the merge-conflicts label Jul 9, 2025

weijinqian_v1 added 2 commits July 9, 2025 23:48

handle code clean

af85566

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle code clean

d4ad734

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added the merge-conflicts label Jul 9, 2025

Merge branch 'v0.9.1-dev' into v0.9.1-dev

847d52d

github-actions bot removed the merge-conflicts label Jul 10, 2025

fix comment

3b7269a

Signed-off-by: duyangkai <duyangkai@huawei.com>

harygo22 force-pushed the v0.9.1-dev branch from 12b7e79 to 3b7269a Compare July 10, 2025 06:45

wangxiyuan changed the title ~~[Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo~~ [0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo Jul 10, 2025

ganyi1996ppo reviewed Jul 10, 2025

View reviewed changes

weijinqian0 force-pushed the v0.9.1-dev branch from 9135d1c to deb4319 Compare July 10, 2025 10:37

github-actions bot added the merge-conflicts label Jul 10, 2025

github-actions bot removed the merge-conflicts label Jul 10, 2025

harygo22 force-pushed the v0.9.1-dev branch from b51b43a to 3b7269a Compare July 10, 2025 11:41

fix init

a8b3e15

Signed-off-by: duyangkai <duyangkai@huawei.com>

harygo22 force-pushed the v0.9.1-dev branch from 2a2a2df to a8b3e15 Compare July 10, 2025 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

weijinqian0 commented Jul 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

ganyi1996ppo Jul 10, 2025

Uh oh!

harygo22 Jul 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

Uh oh!

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

Are you sure you want to change the base?

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

Conversation

weijinqian0 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Performance Improvements

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

ganyi1996ppo Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

harygo22 Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

Uh oh!

weijinqian0 commented Jul 1, 2025 •

edited

Loading

harygo22 Jul 10, 2025 •

edited

Loading