Skip to content

Commit 49dd290

Browse files
bigPYJ1151Chen-zexi
authored andcommitted
[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (vllm-project#20822)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
1 parent 29f8881 commit 49dd290

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/model_executor/layers/quantization/bitsandbytes.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55

66
import torch
77

8-
from vllm.model_executor.layers.fused_moe import fused_experts
98
from vllm.model_executor.layers.fused_moe.layer import (FusedMoE,
109
FusedMoEMethodBase)
1110
from vllm.model_executor.layers.linear import (LinearBase, LinearMethodBase,
@@ -467,6 +466,7 @@ def apply(
467466
logical_to_physical_map: Optional[torch.Tensor] = None,
468467
logical_replica_count: Optional[torch.Tensor] = None,
469468
) -> torch.Tensor:
469+
from vllm.model_executor.layers.fused_moe import fused_experts
470470

471471
if enable_eplb:
472472
raise NotImplementedError(

0 commit comments

Comments
 (0)