[BugFix] Fix import error on non-blackwell machines #21020

LucasWilkinson · 2025-07-16T01:35:16Z

FIX #20769 (comment)

Checked vllm serve runs when built on hopper
Checked can run VLLM_ATTENTION_BACKEND=CUTLASS_MLA_VLLM_V1 lm_eval --model vllm --model_args pretrained=deepseek-ai/DeepSeek-V2-Lite-Chat,trust_remote_code=true --tasks gsm 8k --batch_size auto

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

github-actions · 2025-07-16T01:35:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request fixes an import error on non-Blackwell machines by moving the operator implementation registration into a conditionally compiled file. The dispatch key for the workspace calculation function has been corrected to improve the robustness of the solution.

gemini-code-assist · 2025-07-16T01:36:41Z

csrc/attention/mla/sm100_cutlass_mla_kernel.cu

+TORCH_LIBRARY_IMPL_EXPAND(TORCH_EXTENSION_NAME, CUDA, m) {
+  m.impl("sm100_cutlass_mla_decode", &sm100_cutlass_mla_decode);
+  m.impl("sm100_cutlass_mla_get_workspace_size", &sm100_cutlass_mla_get_workspace_size);
+}


The function sm100_cutlass_mla_get_workspace_size is a host-side function that calculates a workspace size and does not involve any GPU operations. Registering it only for the CUDA dispatch key is incorrect and can lead to runtime errors if called in a context where the PyTorch dispatcher selects a different backend. Host-only functions like this should be registered for the CPU dispatch key to ensure they can be called correctly regardless of the context.

TORCH_LIBRARY_IMPL_EXPAND(TORCH_EXTENSION_NAME, CUDA, m) { m.impl("sm100_cutlass_mla_decode", &sm100_cutlass_mla_decode); } TORCH_LIBRARY_IMPL_EXPAND(TORCH_EXTENSION_NAME, CPU, m) { m.impl("sm100_cutlass_mla_get_workspace_size", &sm100_cutlass_mla_get_workspace_size); }

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson · 2025-07-16T02:11:28Z

GTG:

Checked vllm serve runs when built on hopper
Checked can run VLLM_ATTENTION_BACKEND=CUTLASS_MLA_VLLM_V1 lm_eval --model vllm --model_args pretrained=deepseek-ai/DeepSeek-V2-Lite-Chat,trust_remote_code=true --tasks gsm 8k --batch_size auto on Blackwell

jeejeelee

I have tested this PR locally, and it can fix #20769 (comment), thank you

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson added 2 commits July 16, 2025 01:32

fix linking error on non-blackwell devices

f489caa

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

format

4bb6e6e

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

This was referenced Jul 16, 2025

Revert "SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP" #21019

Closed

SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP #20769

Merged

gemini-code-assist bot reviewed Jul 16, 2025

View reviewed changes

mgoin approved these changes Jul 16, 2025

View reviewed changes

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Jul 16, 2025

LucasWilkinson added 2 commits July 15, 2025 21:54

fix build

7bfe032

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

no tensor args so use catchall

a41432f

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson enabled auto-merge (squash) July 16, 2025 02:20

jeejeelee reviewed Jul 16, 2025

View reviewed changes

vllm-bot merged commit d31a647 into vllm-project:main Jul 16, 2025
87 of 91 checks passed

nadathurv pushed a commit to nadathurv/vllm that referenced this pull request Jul 16, 2025

[BugFix] Fix import error on non-blackwell machines (vllm-project#21020)

9c34bfd

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix import error on non-blackwell machines #21020

[BugFix] Fix import error on non-blackwell machines #21020

Uh oh!

LucasWilkinson commented Jul 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Uh oh!

LucasWilkinson commented Jul 16, 2025 •

edited

Loading

Uh oh!

jeejeelee left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[BugFix] Fix import error on non-blackwell machines #21020

[BugFix] Fix import error on non-blackwell machines #21020

Uh oh!

Conversation

LucasWilkinson commented Jul 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson commented Jul 16, 2025 •

edited by github-actions bot

Loading

LucasWilkinson commented Jul 16, 2025 •

edited

Loading