-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
questionQuestionQuestion
Milestone
Description
Hello! I write the following kernel config to support gather-gemm-scatter fusion on sm 70. But the result is not correct. I debug for a while and can't fix it.
Does cutlass support this fusion on sm 70?
struct cutlass_tensorop_h884gemm_256x128_32x2_nn_align8 {
using Gemm =
cutlass::gemm::device::GemmUniversal<
cutlass::half_t,
cutlass::layout::RowMajor,
cutlass::half_t,
cutlass::layout::RowMajor,
cutlass::half_t,
cutlass::layout::RowMajor,
cutlass::half_t,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm70,
cutlass::gemm::GemmShape<256, 128, 32>,
cutlass::gemm::GemmShape<64, 64, 32>,
cutlass::gemm::GemmShape<8, 8, 4>,
cutlass::epilogue::thread::LinearCombination<
cutlass::half_t,
8,
cutlass::half_t,
cutlass::half_t
>,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<8>,
2,
8,
8,
cutlass::arch::OpMultiplyAdd,
cutlass::ComplexTransform::kNone,
cutlass::ComplexTransform::kNone,
true, // gather a
false, // gather b
true // scatter d
>;
cutlass version: v2.11.0
hardware: nvidia v100
Metadata
Metadata
Assignees
Labels
questionQuestionQuestion