Skip to content

Commit 066ea10

Browse files
committed
Fix enable_multistream_moe for unquantized scenario
Signed-off-by: sdmyzlp <lrwei2@petalmail.com>
1 parent 9e099a5 commit 066ea10

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

vllm_ascend/ops/fused_moe.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1144,9 +1144,11 @@ def forward(self,
11441144
if self.enable_multistream_moe:
11451145
assert gate is not None
11461146
router_logits, _ = gate(hidden_states)
1147-
if isinstance(self.quant_method.quant_method,
1148-
AscendW8A8DynamicFusedMoEMethod
1149-
) and fused_moe_state == FusedMoEState.MC2:
1147+
if not isinstance(self.quant_method,
1148+
AscendUnquantizedFusedMoEMethod) and isinstance(
1149+
self.quant_method.quant_method,
1150+
AscendW8A8DynamicFusedMoEMethod
1151+
) and fused_moe_state == FusedMoEState.MC2:
11501152
with npu_stream_switch("moe_secondary", 0):
11511153
quantized_x_for_share, dynamic_scale_for_share = torch_npu.npu_dynamic_quant(
11521154
hidden_states)

0 commit comments

Comments
 (0)