#
        
        cuda-core
Here are 2 public repositories matching this topic...
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
          gpu          cuda          inference          nvidia          mha          mla          multi-head-attention          gqa          mqa          llm          large-language-model          flash-attention          cuda-core          decoding-attention          flashinfer          flashmla      
    - 
            Updated
            Jun 11, 2025 
- C++
Improve this page
Add a description, image, and links to the cuda-core topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the cuda-core topic, visit your repo's landing page and select "manage topics."