[Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo

weijinqian_v1 · weijinqian_v1 · commit 6f6efc1f9e63 · 2025-07-02T11:25:47.000+08:00
Signed-off-by: weijinqian_v1 &lt;weijinqian@huawei.com&gt;
diff --git a/vllm_ascend/ops/fused_moe.py b/vllm_ascend/ops/fused_moe.py
@@ -566,7 +566,7 @@ def fused_experts_with_all2allv(token_dispatcher, probs, routing_map, hidden_sta
         hidden_states, probs, routing_map
     )
 
-    expert_output = apply_mlp(hidden_states,
+    expert_output = apply_mlp(dispatched_input,
                               w1,
                               w2,
                               tokens_per_expert)

Original file line number	Diff line number	Diff line change
`@@ -566,7 +566,7 @@ def fused_experts_with_all2allv(token_dispatcher, probs, routing_map, hidden_sta`
`566`	`566`	`hidden_states, probs, routing_map`
`567`	`567`	`)`
`568`	`568`
`569`		`- expert_output = apply_mlp(hidden_states,`
	`569`	`+ expert_output = apply_mlp(dispatched_input,`
`570`	`570`	`w1,`
`571`	`571`	`w2,`
`572`	`572`	`tokens_per_expert)`