Skip to content

Commit e59b90e

Browse files
is_blackwell_deep_gemm_used must _lazy_init first
DeepGEMM on blackwell was being ignored because _lazy_init had not been invoked yet. Update the function to require _lazy_init no matter what caller. As a consequence of this change, CUDA init was failing, likely due to the fork/spawn early during initializition. Since this is generic code, switch from invoking cuda_get_device_properties to a platform check. Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
1 parent 316b1bf commit e59b90e

File tree

1 file changed

+8
-5
lines changed

1 file changed

+8
-5
lines changed

vllm/utils/deep_gemm.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,23 @@
1313
import torch
1414

1515
import vllm.envs as envs
16-
from vllm.utils import cuda_get_device_properties, has_deep_gemm
16+
from vllm.platforms import current_platform
17+
from vllm.utils import has_deep_gemm
1718

1819

1920
@functools.cache
2021
def is_blackwell_deep_gemm_used() -> bool:
2122
"""Return ``True`` if vLLM is configured to use DeepGEMM on a
2223
Blackwell-class GPU.
2324
"""
25+
if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()):
26+
return False
2427

25-
if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()
26-
and _per_block_cast_impl is not None):
28+
_lazy_init()
29+
if _per_block_cast_impl is None:
2730
return False
2831

29-
return cuda_get_device_properties(0, ("major", ))[0] == 10
32+
return current_platform.is_cuda() and current_platform.is_device_capability(100)
3033

3134

3235
def _missing(*_: Any, **__: Any) -> NoReturn:
@@ -64,7 +67,7 @@ def _lazy_init() -> None:
6467

6568
if not has_deep_gemm():
6669
return
67-
70+
6871
_dg = importlib.import_module("deep_gemm")
6972

7073
_fp8_gemm_nt_impl = _resolve_symbol(_dg, "fp8_gemm_nt",

0 commit comments

Comments
 (0)