-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Hello, with some modification such as ElementC, LayoutA, LayoutB, I can run the exmaple https://github.com/NVIDIA/cutlass/blob/main/examples/70_blackwell_gemm/70_blackwell_fp8_gemm.cu successfully. But with the same problem size, the cutlass profiler does not profile the kernel which is executed in the modified 70_blackwell_fp8_gemm.cu. And I would like to know in which way I can profile all possible kernels?
Similarly, I execute the following test, which I think is same as the kernel in the modified 70_blackwell_fp8_gemm.cu, and return nothing.
cutlass_profiler --operation=Gemm --m=${m} --n=${n} --k=${k} --alpha=1.0 --beta=0.0 --A=f8:row --B=f8:row --C=f16:row --D=f16:row --batch_count=1 --raster_order=heuristic --accum=f32 --profiling-iterations=100 --cluster_m=2 --cluster_n=2 --cluster_k=1 --inst_m=256 --inst_n=128 --inst_k=64
By the way, I compile cutlass profiler with -DCUTLASS_LIBRARY_KERNELS=all -DCUTLASS_UNITY_BUILD_ENABLED=ON
Thanks