Skip to content

[Bugfix] Fix CUDA arch flags for MoE permute #21426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 24, 2025

Conversation

minosfuture
Copy link
Contributor

@minosfuture minosfuture commented Jul 23, 2025

Purpose

Fix CUDA arch flag for MOE_PERMUTE_SRC.

This resolves the following error

  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 545, in _maybe_chunk_fused_experts
    return self._do_fused_experts(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 492, in _do_fused_experts
    self.fused_experts.apply(
  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 314, in apply
    run_cutlass_moe_fp8(
  File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 200, in run_cutlass_moe_fp8
    output.copy_(ops.shuffle_rows(c3, c_map).view(M * topk, K),
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Test Plan

pytest tests/kernels/test_shuffle_rows.py

Test Result

UT passed

(Optional) Documentation Update

Signed-off-by: Ming Yang <minos.future@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a critical bug in CMakeLists.txt by correcting the CUDA architecture flags for the MoE permute kernel, resolving a runtime error. The fix is direct and effective, ensuring the kernel runs on the intended hardware.

Comment on lines +845 to +846
SRCS "${MOE_PERMUTE_SRC}"
CUDA_ARCHS "${CUDA_ARCHS}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The CUDA arch flags are now correctly set to ${CUDA_ARCHS}. This ensures that the MoE permute kernel is compiled with the appropriate architecture flags, resolving the original error. This change is critical to ensure the kernel runs on the intended hardware.

    CUDA_ARCHS "${CUDA_ARCHS}")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess someone just copied it over, lol

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@minosfuture
Copy link
Contributor Author

cc: @tlrmchlsmth @houseroad

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the fix.

@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 24, 2025
@houseroad houseroad enabled auto-merge (squash) July 24, 2025 00:57
@vllm-bot vllm-bot merged commit 2ded067 into vllm-project:main Jul 24, 2025
107 of 108 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants