DeepGEMM is not enabled on B200 when loading DeepSeek R1 #21472

smarterclayton · 2025-07-23T17:21:15Z

Purpose

While testing HEAD against DeepSeek-R1 on 2 B200 nodes (DP=16), despite VLLM_USE_DEEP_GEMM=1 I was receiving:

[fp8.py:520] DeepGemm not supported on the current platform.

After debugging, is_blackwell_deep_gemm_used can return false if no other deep_gemm.py utility method is invoked, because it does not explicitly invoke _lazy_init() to populate _per_block_cast_impl.

Update the function to always invoke _lazy_init(), consistent with other utility methods.

Test Plan

Verify startup on B200 in the environment.

Test Result

After this change was deployed, I see:

Using DeepGemm SM100 kernels for Fp8MoEMethod.

(Optional) Documentation Update

github-actions · 2025-07-23T17:21:24Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request aims to fix an issue where DeepGEMM is not correctly detected on B200 GPUs due to a missing call to _lazy_init. The proposed change attempts to fix this but introduces new logical flaws and doesn't fully resolve the original issue. I've provided a suggestion to correct the logic to ensure _lazy_init is called under the right conditions, making the feature detection robust.

gemini-code-assist · 2025-07-23T17:22:14Z

vllm/utils/deep_gemm.py

+    if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()):
+        _lazy_init()
+        if _per_block_cast_impl is None:
+            return False


Thanks for fixing this initialization issue! However, the current logic seems to have a couple of flaws:

If envs.VLLM_USE_DEEP_GEMM is False, the if condition on line 24 is met. The code then calls _lazy_init(), which may populate _per_block_cast_impl. If it does, the function will not return False here and will proceed to check the GPU properties, which is incorrect since the feature is disabled.

If both envs.VLLM_USE_DEEP_GEMM and has_deep_gemm() are True, the if condition is False, so _lazy_init() is never called. This means _per_block_cast_impl will remain None, which is the original bug this PR aims to fix.

To correctly fix the initialization order dependency, we should first check the prerequisites, then call _lazy_init(), and finally check the GPU properties. This ensures _lazy_init() is always called when the feature is enabled and available, and not called otherwise.

Suggested change

if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()):

_lazy_init()

if _per_block_cast_impl is None:

return False

if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()):

return False

_lazy_init()

if _per_block_cast_impl is None:

return False

I think Gemini is right here.

smarterclayton · 2025-07-23T17:29:26Z

@yewentao256 as I mentioned in Slack

smarterclayton · 2025-07-23T17:36:39Z

However, after applying this fix, I get:

(EngineCore_3 pid=1043)   File "/app/vllm/vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py", line 289, in apply
(EngineCore_3 pid=1043)     fp8_m_grouped_gemm_nt_masked((a1q, a1q_scale), (w1, w1_scale),
(EngineCore_3 pid=1043)   File "/app/vllm/vllm/utils/deep_gemm.py", line 106, in fp8_m_grouped_gemm_nt_masked
(EngineCore_3 pid=1043)     return _grouped_masked_impl(*args, **kwargs)
(EngineCore_3 pid=1043)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_3 pid=1043) RuntimeError: Failed: Assertion error csrc/jit_kernels/impls/smxx_layout.hpp:136 'mn % 4 == 0 and num_groups == 1'

My understanding was that DeepGEMM+DeepSeek-R1 on B200 should be working (and this PR assumes that was the case), but this failure suggests there is something missing (i'm on HEAD of DeepGEMM and HEAD of vLLM).

yewentao256

Thanks for the fix! Just update through Gemini's suggestion and we are ready to merge

yewentao256 · 2025-07-23T19:02:02Z

vllm/utils/deep_gemm.py

+    if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()):
+        _lazy_init()
+        if _per_block_cast_impl is None:
+            return False


yewentao256 · 2025-07-23T19:03:19Z

However, after applying this fix, I get:

(EngineCore_3 pid=1043)   File "/app/vllm/vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py", line 289, in apply
(EngineCore_3 pid=1043)     fp8_m_grouped_gemm_nt_masked((a1q, a1q_scale), (w1, w1_scale),
(EngineCore_3 pid=1043)   File "/app/vllm/vllm/utils/deep_gemm.py", line 106, in fp8_m_grouped_gemm_nt_masked
(EngineCore_3 pid=1043)     return _grouped_masked_impl(*args, **kwargs)
(EngineCore_3 pid=1043)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_3 pid=1043) RuntimeError: Failed: Assertion error csrc/jit_kernels/impls/smxx_layout.hpp:136 'mn % 4 == 0 and num_groups == 1'

My understanding was that DeepGEMM+DeepSeek-R1 on B200 should be working (and this PR assumes that was the case), but this failure suggests there is something missing (i'm on HEAD of DeepGEMM and HEAD of vLLM).

Thanks for letting me know, let me take a look

smarterclayton · 2025-07-24T16:26:26Z

Further debugging showed that the original proposed fix was resulting in a CUDA initialization error when calling cuda_get_device_properties. We are in the process of debugging why, possibly due to the fork/spawn. In the meantime, I've updated this to a version that uses current_platform.is_cuda() and current_platform.cuda.is_device_capability(100) to avoid having a separate code path and was able to initialize without errors (although I'm still hitting the smxx_layout issue assertion).

Current CLI that triggers the failure on these GCP B200 A4 nodes with Ubuntu 24 and latest nvidia driver (IBGDA+nvidia_peermem + GCP recommended NCCL):

vllm serve Qwen/Qwen3-30B-A3B-FP8 --port 8000 --enforce-eager --disable-log-requests --enable-expert-parallel --tensor-parallel-size 1 --data-parallel-size 2  --trust-remote-code

yewentao256

Thanks for the work! Looks good to me

vllm/utils/deep_gemm.py

DeepGEMM on blackwell was being ignored because _lazy_init had not been invoked yet. Update the function to require _lazy_init no matter what caller. As a consequence of this change, CUDA init was failing, likely due to the fork/spawn early during initializition. Since this is generic code, switch from invoking cuda_get_device_properties to a platform check. Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>

github-project-automation bot added this to DeepSeek V3/R1 Jul 23, 2025

github-project-automation bot moved this to Backlog in DeepSeek V3/R1 Jul 23, 2025

mergify bot added the deepseek Related to DeepSeek models label Jul 23, 2025

gemini-code-assist bot reviewed Jul 23, 2025

View reviewed changes

yewentao256 reviewed Jul 23, 2025

View reviewed changes

smarterclayton force-pushed the lazy_init_blackwell_check branch 3 times, most recently from 4753086 to e59b90e Compare July 24, 2025 16:22

smarterclayton mentioned this pull request Jul 24, 2025

[Bug]: Failing to initialize engine on qwen3 on B200 with VLLM_USE_DEEP_GEMM=1 #21542

Open

1 task

mgoin added ready ONLY add when PR is ready to merge/full CI is needed bug Something isn't working labels Jul 24, 2025

yewentao256 approved these changes Jul 24, 2025

View reviewed changes

vllm/utils/deep_gemm.py Outdated Show resolved Hide resolved

yewentao256 mentioned this pull request Jul 24, 2025

[Bug] Fix DeepGemm Init Error #21554

Merged

smarterclayton force-pushed the lazy_init_blackwell_check branch from e59b90e to 1c5feb9 Compare July 24, 2025 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DeepGEMM is not enabled on B200 when loading DeepSeek R1 #21472

DeepGEMM is not enabled on B200 when loading DeepSeek R1 #21472

smarterclayton commented Jul 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

mgoin Jul 23, 2025

Uh oh!

yewentao256 Jul 23, 2025

Uh oh!

smarterclayton commented Jul 23, 2025

Uh oh!

smarterclayton commented Jul 23, 2025 •

edited

Loading

Uh oh!

yewentao256 left a comment

Uh oh!

yewentao256 Jul 23, 2025

Uh oh!

yewentao256 commented Jul 23, 2025

Uh oh!

smarterclayton commented Jul 24, 2025 •

edited

Loading

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DeepGEMM is not enabled on B200 when loading DeepSeek R1 #21472

Are you sure you want to change the base?

DeepGEMM is not enabled on B200 when loading DeepSeek R1 #21472

Conversation

smarterclayton commented Jul 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

smarterclayton commented Jul 23, 2025

Uh oh!

smarterclayton commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 commented Jul 23, 2025

Uh oh!

smarterclayton commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smarterclayton commented Jul 23, 2025 •

edited by github-actions bot

Loading

smarterclayton commented Jul 23, 2025 •

edited

Loading

smarterclayton commented Jul 24, 2025 •

edited

Loading