[BUG] No TMEM allocation in Blackwell CuTe tutorial examples #2230

allispaul · 2025-04-08T18:14:21Z

Describe the bug
The CuTe tutorial examples (e.g.https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/blackwell/01_mma_sm100.cu#L226) define a TMEM-backed tensor which is used to store the output of GEMM, but never actually allocate TMEM space using tcgen05.alloc and related instructions. I believe that there's a possibility of a race condition between multiple CTAs that occupy the same SM and end up writing to the same TMEM space.

(I don't have access to Blackwell so I can't confirm this myself; for what it's worth, my coworker who has Blackwell access tested that kernel on large problem sizes and didn't see validation errors.)

The text was updated successfully, but these errors were encountered:

manishucsd · 2025-04-09T15:05:59Z

Are there any CUDA tools to help discover potential issues like this with TMEM races? OR this is entirely programmer responsibility with no tools to stamp that a Blackwell kernel is race-free?

thakkarV · 2025-04-09T15:13:43Z

You can set this compile time flag in ptxas to enable TMEM access violation checks: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#g-tensor-memory-access-check-g-tmem-access-check

manishucsd · 2025-04-09T15:25:37Z

Will have to dig what all this flag checks during compilation? The documentation does not mention what all --g-tensor-memory-access-check flag checks.

Also, no runtime-checks?

thakkarV · 2025-04-09T15:27:31Z

TMEM pointers are dynamic. These checks cannot be compile time. They are runtime checks

hwu36 · 2025-04-30T02:34:41Z

this should be fixed in 3.9 now.

allispaul added ? - Needs Triage bug Something isn't working labels Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] No TMEM allocation in Blackwell CuTe tutorial examples #2230

[BUG] No TMEM allocation in Blackwell CuTe tutorial examples #2230

allispaul commented Apr 8, 2025

manishucsd commented Apr 9, 2025

Uh oh!

thakkarV commented Apr 9, 2025

Uh oh!

manishucsd commented Apr 9, 2025 •

edited

Loading

Uh oh!

thakkarV commented Apr 9, 2025

Uh oh!

hwu36 commented Apr 30, 2025

Uh oh!

[BUG] No TMEM allocation in Blackwell CuTe tutorial examples #2230

[BUG] No TMEM allocation in Blackwell CuTe tutorial examples #2230

Comments

allispaul commented Apr 8, 2025

manishucsd commented Apr 9, 2025

Uh oh!

thakkarV commented Apr 9, 2025

Uh oh!

manishucsd commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thakkarV commented Apr 9, 2025

Uh oh!

hwu36 commented Apr 30, 2025

Uh oh!

manishucsd commented Apr 9, 2025 •

edited

Loading