-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
[BugFix] Fix import error on non-blackwell machines #21020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix import error on non-blackwell machines #21020
Conversation
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes an import error on non-Blackwell machines by moving the operator implementation registration into a conditionally compiled file. The dispatch key for the workspace calculation function has been corrected to improve the robustness of the solution.
TORCH_LIBRARY_IMPL_EXPAND(TORCH_EXTENSION_NAME, CUDA, m) { | ||
m.impl("sm100_cutlass_mla_decode", &sm100_cutlass_mla_decode); | ||
m.impl("sm100_cutlass_mla_get_workspace_size", &sm100_cutlass_mla_get_workspace_size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function sm100_cutlass_mla_get_workspace_size
is a host-side function that calculates a workspace size and does not involve any GPU operations. Registering it only for the CUDA
dispatch key is incorrect and can lead to runtime errors if called in a context where the PyTorch dispatcher selects a different backend. Host-only functions like this should be registered for the CPU
dispatch key to ensure they can be called correctly regardless of the context.
TORCH_LIBRARY_IMPL_EXPAND(TORCH_EXTENSION_NAME, CUDA, m) {
m.impl("sm100_cutlass_mla_decode", &sm100_cutlass_mla_decode);
}
TORCH_LIBRARY_IMPL_EXPAND(TORCH_EXTENSION_NAME, CPU, m) {
m.impl("sm100_cutlass_mla_get_workspace_size", &sm100_cutlass_mla_get_workspace_size);
}
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
GTG: Checked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this PR locally, and it can fix #20769 (comment), thank you
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
FIX #20769 (comment)
Checked
vllm serve
runs when built on hopperChecked can run
VLLM_ATTENTION_BACKEND=CUTLASS_MLA_VLLM_V1 lm_eval --model vllm --model_args pretrained=deepseek-ai/DeepSeek-V2-Lite-Chat,trust_remote_code=true --tasks gsm 8k --batch_size auto