[Bugfix] Fix CUDA arch flags for MoE permute #21426

minosfuture · 2025-07-23T03:49:23Z

Purpose

Fix CUDA arch flag for MOE_PERMUTE_SRC.

This resolves the following error

  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 545, in _maybe_chunk_fused_experts
    return self._do_fused_experts(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 492, in _do_fused_experts
    self.fused_experts.apply(
  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 314, in apply
    run_cutlass_moe_fp8(
  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 200, in run_cutlass_moe_fp8
    output.copy_(ops.shuffle_rows(c3, c_map).view(M * topk, K),
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Test Plan

pytest tests/kernels/test_shuffle_rows.py

Test Result

UT passed

(Optional) Documentation Update

Signed-off-by: Ming Yang <minos.future@gmail.com>

gemini-code-assist

Code Review

This pull request fixes a critical bug in CMakeLists.txt by correcting the CUDA architecture flags for the MoE permute kernel, resolving a runtime error. The fix is direct and effective, ensuring the kernel runs on the intended hardware.

gemini-code-assist · 2025-07-23T03:50:39Z

CMakeLists.txt

+    SRCS "${MOE_PERMUTE_SRC}"
+    CUDA_ARCHS "${CUDA_ARCHS}")


The CUDA arch flags are now correctly set to ${CUDA_ARCHS}. This ensures that the MoE permute kernel is compiled with the appropriate architecture flags, resolving the original error. This change is critical to ensure the kernel runs on the intended hardware.

CUDA_ARCHS "${CUDA_ARCHS}")

I guess someone just copied it over, lol

github-actions · 2025-07-23T03:51:17Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

minosfuture · 2025-07-23T16:29:48Z

cc: @tlrmchlsmth @houseroad

houseroad

Looks good. Thanks for the fix.

[Bugfix] Fix CUDA arch flags for MoE permute

33e7b8c

Signed-off-by: Ming Yang <minos.future@gmail.com>

minosfuture requested review from tlrmchlsmth, WoosukKwon and LucasWilkinson as code owners July 23, 2025 03:49

mergify bot added the ci/build label Jul 23, 2025

gemini-code-assist bot reviewed Jul 23, 2025

View reviewed changes

minosfuture mentioned this pull request Jul 23, 2025

[Bug]: Llama4 Maverick runtime error (shuffle_rows) #21322

Closed

1 task

bnellnm approved these changes Jul 23, 2025

View reviewed changes

houseroad approved these changes Jul 24, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 24, 2025

houseroad enabled auto-merge (squash) July 24, 2025 00:57

vllm-bot merged commit 2ded067 into vllm-project:main Jul 24, 2025
107 of 108 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix CUDA arch flags for MoE permute #21426

[Bugfix] Fix CUDA arch flags for MoE permute #21426

Uh oh!

minosfuture commented Jul 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

houseroad Jul 24, 2025

Uh oh!

github-actions bot commented Jul 23, 2025

Uh oh!

minosfuture commented Jul 23, 2025

Uh oh!

houseroad left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Fix CUDA arch flags for MoE permute #21426

[Bugfix] Fix CUDA arch flags for MoE permute #21426

Uh oh!

Conversation

minosfuture commented Jul 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 23, 2025

Uh oh!

minosfuture commented Jul 23, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

minosfuture commented Jul 23, 2025 •

edited by github-actions bot

Loading