Skip to content

[BUG] No TMEM allocation in Blackwell CuTe tutorial examples #2230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
allispaul opened this issue Apr 8, 2025 · 5 comments
Open

[BUG] No TMEM allocation in Blackwell CuTe tutorial examples #2230

allispaul opened this issue Apr 8, 2025 · 5 comments
Labels
? - Needs Triage bug Something isn't working

Comments

@allispaul
Copy link

Describe the bug
The CuTe tutorial examples (e.g.https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/blackwell/01_mma_sm100.cu#L226) define a TMEM-backed tensor which is used to store the output of GEMM, but never actually allocate TMEM space using tcgen05.alloc and related instructions. I believe that there's a possibility of a race condition between multiple CTAs that occupy the same SM and end up writing to the same TMEM space.

(I don't have access to Blackwell so I can't confirm this myself; for what it's worth, my coworker who has Blackwell access tested that kernel on large problem sizes and didn't see validation errors.)

@allispaul allispaul added ? - Needs Triage bug Something isn't working labels Apr 8, 2025
@manishucsd
Copy link
Contributor

Are there any CUDA tools to help discover potential issues like this with TMEM races? OR this is entirely programmer responsibility with no tools to stamp that a Blackwell kernel is race-free?

@thakkarV
Copy link
Collaborator

thakkarV commented Apr 9, 2025

You can set this compile time flag in ptxas to enable TMEM access violation checks: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#g-tensor-memory-access-check-g-tmem-access-check

@manishucsd
Copy link
Contributor

manishucsd commented Apr 9, 2025

Will have to dig what all this flag checks during compilation? The documentation does not mention what all --g-tensor-memory-access-check flag checks.

Also, no runtime-checks?

@thakkarV
Copy link
Collaborator

thakkarV commented Apr 9, 2025

TMEM pointers are dynamic. These checks cannot be compile time. They are runtime checks

@hwu36
Copy link
Collaborator

hwu36 commented Apr 30, 2025

this should be fixed in 3.9 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants