Skip to content

[QST] (C7504) & (C7509) Potential Performance Loss: about 'setmaxnreg' and wgmma.mma_async #2452

@chenhongyu2048

Description

@chenhongyu2048

What is your question?

I wrote a kernel based on cute and cutlass, which used cutlass::arch::warpgroup_reg_alloc and wgmma.mma_async.
I wrote the kernel in a kernel.cuh file and included the kernel.cuh in src1.cu. Since my project contains multiple cu files, such as src2.cu, src3.cu. I need to use rdc=True to compile my project.
However, the following warning appeared during the compilation process.

ptxas info    : (C7504) Potential Performance Loss: 'setmaxnreg' ignored to maintain compatibility across compilation units.
ptxas info    : (C7509) Potential Performance Loss: wgmma.mma_async instructions are serialized due to the presence of Extern calls in the function 

I can be sure that I have used the -DNDEBUG flag. I'm using cuda12.8 on H20.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions