Skip to content

Sparse Decode Mla kernel opt plan On blcakwell #120

@lw921014

Description

@lw921014

Now,sparse decode mla kernel can achieve 350T flops yet on Blackwell,do we have some plan to opt it?Now we are opt it and achieve 500T flops yet, and still is working to 1000Tflops in near future. If we both work for this, can we have some possibility to work together to opt it to 1000T flops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions