Skip to content

[QST] Does examples/36 available on sm 70? #840

@umiswing

Description

@umiswing

Hello! I write the following kernel config to support gather-gemm-scatter fusion on sm 70. But the result is not correct. I debug for a while and can't fix it.
Does cutlass support this fusion on sm 70?

struct cutlass_tensorop_h884gemm_256x128_32x2_nn_align8 {                                                                                                
   using Gemm =                                                                                                                                           
      cutlass::gemm::device::GemmUniversal<                                                                                                                
       cutlass::half_t,                                                                                                                                   
       cutlass::layout::RowMajor,                                                                                                                         
       cutlass::half_t,                                                                                                                                   
       cutlass::layout::RowMajor,                                                                                                                         
       cutlass::half_t,                                                                                                                                   
       cutlass::layout::RowMajor,                                                                                                                         
       cutlass::half_t,                                                                                                                                   
       cutlass::arch::OpClassTensorOp,                                                                                                                    
       cutlass::arch::Sm70,                                                                                                                               
       cutlass::gemm::GemmShape<256, 128, 32>,                                                                                                            
       cutlass::gemm::GemmShape<64, 64, 32>,                                                                                                              
       cutlass::gemm::GemmShape<8, 8, 4>,                                                                                                                                                                                                                                                                 
       cutlass::epilogue::thread::LinearCombination<                                                                                                        
         cutlass::half_t,                                                                                                                                   
         8,                                                                                                                                                 
         cutlass::half_t,                                                                                                                                   
         cutlass::half_t                                                                                                                                    
       >,                                                                                                                                                        
       cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<8>,                                                                                     
       2,                                                                                                                                                 
       8,                                                                                                                                                 
       8,                                                                                                                                                 
       cutlass::arch::OpMultiplyAdd,                                                                                                                      
       cutlass::ComplexTransform::kNone,                                                                                                                  
       cutlass::ComplexTransform::kNone,                                                                                                                  
       true, // gather a                                                                                                                                  
       false, // gather b                                                                                                                                 
       true // scatter d                                                                                                                                  
>;                     

cutlass version: v2.11.0
hardware: nvidia v100

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions