What is your question?
I want to fuse gemm and reduction ops and when I run this example with parameters --parallel_split_k and split-k_slices it can not work.
I test this example on A10 and here is my command and the result:
./23_ampere_gemm_operand_reduction_fusion --save-workspace --split-k-slices=2 --parallel-split-k ERROR - results miscompared. Results written to '23_ampere_gemm_operand_reduction_fusion1024x1024x1024.dat'. ID,M,N,K,SplitK-Slices,Parallel-SplitK,Runtime gemm_1,1024,1024,1024,2,1,0
Would you please help me with this error? PS: If my inputs are tf32, can I use cutlass to fuse these two ops?
Thanks a lot!!