Skip to content

Commit 2cfba2d

Browse files
sdmyzlpyangcheng (AJ)
authored andcommitted
Support multistream of shared experts in FusedMoE (vllm-project#997)
Contains on vllm-project#1111 for completeness. <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> Implement multi-stream parallelism for MoE layers with shared experts, where computation of shared experts will be overlapped with expert token dispatch and combine. Also, when multi-stream is enabled, weights of shared experts will be force to replicate across all cards, regardless of any tensor parallelism configurations, to avoid AllReduce operations. With the expected overlaping being: ``` | shared gate_up | shared act | | shared down | | dispatch | routed gate_up, act, down | combine | ``` <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> No. <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> Tested on 1x16 910 node, with tailored 2 layer DSKv2. <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Signed-off-by: sdmyzlp <lrwei2@petalmail.com>
1 parent 1188d41 commit 2cfba2d

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

vllm_ascend/ascend_config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ def __init__(self, vllm_config):
3737
ascend_scheduler_config)
3838

3939
self.expert_map_path = additional_config.get("expert_map_path", None)
40+
self.dynamic_eplb = additional_config.get("dynamic_eplb", False)
4041
self.chunked_prefill_for_mla = additional_config.get(
4142
"chunked_prefill_for_mla", False)
4243
self.enable_weight_nz_layout = additional_config.get(

vllm_ascend/models/deepseek_v2.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -733,9 +733,12 @@ def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
733733
quant_config = vllm_config.quant_config
734734
self.config = config
735735
self.quant_config = quant_config
736+
self.num_dense_layers = self.config.first_k_dense_replace
737+
self.num_moe_layers = self.config.num_hidden_layers - self.num_dense_layers
736738
self.model = CustomDeepseekV2Model(vllm_config=vllm_config,
737-
prefix=maybe_prefix(
739+
prefix=maybe_prefix(
738740
prefix, "model"))
741+
739742
if get_pp_group().is_last_rank:
740743
self.lm_head = ParallelLMHead(config.vocab_size,
741744
config.hidden_size,

0 commit comments

Comments
 (0)