Skip to content

[QST] Why cutlass profiler not profile all kernels? #2350

@YSF-A

Description

@YSF-A

Hello, with some modification such as ElementC, LayoutA, LayoutB, I can run the exmaple https://github.com/NVIDIA/cutlass/blob/main/examples/70_blackwell_gemm/70_blackwell_fp8_gemm.cu successfully. But with the same problem size, the cutlass profiler does not profile the kernel which is executed in the modified 70_blackwell_fp8_gemm.cu. And I would like to know in which way I can profile all possible kernels?

Similarly, I execute the following test, which I think is same as the kernel in the modified 70_blackwell_fp8_gemm.cu, and return nothing.
cutlass_profiler --operation=Gemm --m=${m} --n=${n} --k=${k} --alpha=1.0 --beta=0.0 --A=f8:row --B=f8:row --C=f16:row --D=f16:row --batch_count=1 --raster_order=heuristic --accum=f32 --profiling-iterations=100 --cluster_m=2 --cluster_n=2 --cluster_k=1 --inst_m=256 --inst_n=128 --inst_k=64

By the way, I compile cutlass profiler with -DCUTLASS_LIBRARY_KERNELS=all -DCUTLASS_UNITY_BUILD_ENABLED=ON

Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions