Skip to content

Commit 3640c60

Browse files
authored
Avoid unfused Transpose in DeepSeekV3 EP256 MoE layer (#1091)
### What this PR does / why we need it? View optimization in torchair (defaulted to on for Transpose with any of its axis being 1) prevents the weight Transpose to be fused with later GroupedMatmul, which decrease the performance of MoE layer when expert parallelism equals the total number of experts (e.g. EP256 for DSKv3). Add an option to solve this problem by disabling the optimization. ### Does this PR introduce _any_ user-facing change? Controlled by `additional_config.torchair_graph_config.enable_view_optimize`, defaulted to `True`. ### How was this patch tested? Tested on 1x16 910 node, with tailored 2 layer DSKv2. Signed-off-by: sdmyzlp <lrwei2@petalmail.com>
1 parent 8d00775 commit 3640c60

File tree

4 files changed

+7
-0
lines changed

4 files changed

+7
-0
lines changed

docs/source/user_guide/additional_config.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ The details of each config option are as follows:
3838
| Name | Type | Default | Description |
3939
| ---- | ---- | ------- | ----------- |
4040
| `enabled` | bool | `False` | Whether to enable torchair graph mode |
41+
| `enable_view_optimize` | bool | `True` | Whether to enable torchair view optimization |
4142
| `use_cached_graph` | bool | `False` | Whether to use cached graph |
4243
| `graph_batch_sizes` | list[int] | `[]` | The batch size for torchair graph cache |
4344
| `graph_batch_sizes_init` | bool | `False` | Init graph batch size dynamically if `graph_batch_sizes` is empty |

vllm_ascend/ascend_config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ def __init__(self, torchair_graph_config):
5555
"graph_batch_sizes_init", False)
5656
self.enable_multistream_shared_expert = torchair_graph_config.get(
5757
"enable_multistream_shared_expert", False)
58+
self.enable_view_optimize = torchair_graph_config.get(
59+
"enable_view_optimize", True)
5860

5961
if not isinstance(self.graph_batch_sizes, list):
6062
raise TypeError("graph_batch_sizes must be list[int]")

vllm_ascend/worker/model_runner.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1037,6 +1037,8 @@ def load_model(self) -> None:
10371037
config = torchair.CompilerConfig()
10381038
config.experimental_config.frozen_parameter = True
10391039
config.experimental_config.tiling_schedule_optimize = True
1040+
config.experimental_config.enable_view_optimize = \
1041+
get_ascend_config().torchair_graph_config.enable_view_optimize
10401042
torch.npu.set_compile_mode(jit_compile=False)
10411043
if not self.use_cached_npu_graph:
10421044
npu_backend = torchair.get_npu_backend(compiler_config=config)

vllm_ascend/worker/model_runner_v1.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1286,6 +1286,8 @@ def _get_torchair_lazy_compiled_model(self, batch_size: int):
12861286
config = torchair.CompilerConfig()
12871287
config.experimental_config.frozen_parameter = True
12881288
config.experimental_config.tiling_schedule_optimize = True
1289+
config.experimental_config.enable_view_optimize = \
1290+
get_ascend_config().torchair_graph_config.enable_view_optimize
12891291
torch.npu.set_compile_mode(jit_compile=False)
12901292
if not self.use_cached_npu_graph:
12911293
npu_backend = torchair.get_npu_backend(compiler_config=config)

0 commit comments

Comments
 (0)