-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
DeepGEMM is not enabled on B200 when loading DeepSeek R1 #21472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DeepGEMM is not enabled on B200 when loading DeepSeek R1 #21472
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an issue where DeepGEMM is not correctly detected on B200 GPUs due to a missing call to _lazy_init
. The proposed change attempts to fix this but introduces new logical flaws and doesn't fully resolve the original issue. I've provided a suggestion to correct the logic to ensure _lazy_init
is called under the right conditions, making the feature detection robust.
vllm/utils/deep_gemm.py
Outdated
if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()): | ||
_lazy_init() | ||
if _per_block_cast_impl is None: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this initialization issue! However, the current logic seems to have a couple of flaws:
- If
envs.VLLM_USE_DEEP_GEMM
isFalse
, theif
condition on line 24 is met. The code then calls_lazy_init()
, which may populate_per_block_cast_impl
. If it does, the function will not returnFalse
here and will proceed to check the GPU properties, which is incorrect since the feature is disabled. - If both
envs.VLLM_USE_DEEP_GEMM
andhas_deep_gemm()
areTrue
, theif
condition isFalse
, so_lazy_init()
is never called. This means_per_block_cast_impl
will remainNone
, which is the original bug this PR aims to fix.
To correctly fix the initialization order dependency, we should first check the prerequisites, then call _lazy_init()
, and finally check the GPU properties. This ensures _lazy_init()
is always called when the feature is enabled and available, and not called otherwise.
if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()): | |
_lazy_init() | |
if _per_block_cast_impl is None: | |
return False | |
if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()): | |
return False | |
_lazy_init() | |
if _per_block_cast_impl is None: | |
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Gemini is right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@yewentao256 as I mentioned in Slack |
However, after applying this fix, I get:
My understanding was that DeepGEMM+DeepSeek-R1 on B200 should be working (and this PR assumes that was the case), but this failure suggests there is something missing (i'm on HEAD of DeepGEMM and HEAD of vLLM). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Just update through Gemini's suggestion and we are ready to merge
vllm/utils/deep_gemm.py
Outdated
if not (envs.VLLM_USE_DEEP_GEMM and has_deep_gemm()): | ||
_lazy_init() | ||
if _per_block_cast_impl is None: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks for letting me know, let me take a look |
4753086
to
e59b90e
Compare
Further debugging showed that the original proposed fix was resulting in a CUDA initialization error when calling Current CLI that triggers the failure on these GCP B200 A4 nodes with Ubuntu 24 and latest nvidia driver (IBGDA+nvidia_peermem + GCP recommended NCCL):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! Looks good to me
DeepGEMM on blackwell was being ignored because _lazy_init had not been invoked yet. Update the function to require _lazy_init no matter what caller. As a consequence of this change, CUDA init was failing, likely due to the fork/spawn early during initializition. Since this is generic code, switch from invoking cuda_get_device_properties to a platform check. Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
e59b90e
to
1c5feb9
Compare
Purpose
While testing HEAD against DeepSeek-R1 on 2 B200 nodes (DP=16), despite
VLLM_USE_DEEP_GEMM=1
I was receiving:After debugging,
is_blackwell_deep_gemm_used
can return false if no other deep_gemm.py utility method is invoked, because it does not explicitly invoke _lazy_init() to populate_per_block_cast_impl
.Update the function to always invoke
_lazy_init()
, consistent with other utility methods.Test Plan
Verify startup on B200 in the environment.
Test Result
After this change was deployed, I see:
(Optional) Documentation Update