Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish I could use CUTLASS to do [...]
Use FP8 sparse tensor cores to speed up fp8 gemm in LLM(example:llama)
Describe the solution you'd like
A clear and concise description of what you want to happen.
FP8 sparse tensor cores support A(row+dense) x B(sparse and may be must col) = C(row+dense)
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.