Revert "SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP" #21019

jeejeelee · 2025-07-16T01:24:04Z

Reverts #20769
FIX #20769 (comment)

…for Deep…" This reverts commit 8cdc371.

github-actions · 2025-07-16T01:24:12Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request reverts the changes from PR #20769, which introduced a new SM100 Cutlass MLA decode implementation. The revert appears to be clean and complete, removing the new feature and its related code across the C++, Python, and build system files. The changes correctly restore the previous state of the affected components.

LucasWilkinson · 2025-07-16T01:27:34Z

Give me a couple minutes I can push a fix

LucasWilkinson · 2025-07-16T01:36:02Z

#21020 testing it now

tlrmchlsmth

Seems we have a fix @LucasWilkinson? https://github.com/vllm-project/vllm/pull/21020/files

jeejeelee · 2025-07-16T01:42:40Z

Thank you for the quick fix, I am testing it locally now @LucasWilkinson

LucasWilkinson · 2025-07-16T02:05:06Z

yes sorry confirmed it works on hopper but waiting for it to build on Blackwell; building this kernel is slowww.... haha

edit:

Checked vllm serve runs when built on hopper
Checked can run VLLM_ATTENTION_BACKEND=CUTLASS_MLA_VLLM_V1 lm_eval --model vllm --model_args pretrained=deepseek-ai/DeepSeek-V2-Lite-Chat,trust_remote_code=true --tasks gsm 8k --batch_size auto

Revert "SM100 Cutlass MLA decode with unrestricted num_heads (< 128) …

96dec48

…for Deep…" This reverts commit 8cdc371.

jeejeelee requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, tlrmchlsmth and LucasWilkinson as code owners July 16, 2025 01:24

mergify bot added ci/build deepseek Related to DeepSeek models v1 labels Jul 16, 2025

jeejeelee requested review from mgoin and LucasWilkinson and removed request for tlrmchlsmth, comaniac, LucasWilkinson, njhill, WoosukKwon, alexm-redhat, robertgshaw2-redhat and ywang96 July 16, 2025 01:25

gemini-code-assist bot reviewed Jul 16, 2025

View reviewed changes

jeejeelee requested a review from alexm-redhat July 16, 2025 01:25

tlrmchlsmth approved these changes Jul 16, 2025

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 16, 2025

tlrmchlsmth requested changes Jul 16, 2025

View reviewed changes

jeejeelee closed this Jul 16, 2025

jeejeelee deleted the revert-20769-mla_fi_prefill_and_decode branch July 16, 2025 02:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Revert "SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP" #21019

Revert "SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP" #21019

jeejeelee commented Jul 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

LucasWilkinson commented Jul 16, 2025

Uh oh!

LucasWilkinson commented Jul 16, 2025

Uh oh!

tlrmchlsmth left a comment

Uh oh!

jeejeelee commented Jul 16, 2025

Uh oh!

LucasWilkinson commented Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Revert "SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP" #21019

Revert "SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP" #21019

Conversation

jeejeelee commented Jul 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

LucasWilkinson commented Jul 16, 2025

Uh oh!

LucasWilkinson commented Jul 16, 2025

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Jul 16, 2025

Uh oh!

LucasWilkinson commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jeejeelee commented Jul 16, 2025 •

edited by github-actions bot

Loading

LucasWilkinson commented Jul 16, 2025 •

edited

Loading