[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense)

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I wish I could use CUTLASS to do [...]
Use FP8 sparse tensor cores to speed up fp8 gemm in LLM(example:llama)

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

FP8 sparse tensor cores support A(row+dense) x B(sparse and may be must col) = C(row+dense)

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context, code examples, or references to existing implementations about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense) #2032

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense) #2032

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions