Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TRT.

    C++ 4.1k 739

  2. LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

    Cuda 4k 431

  3. Awesome-LLM-Inference Public

    📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

    Python 4k 276

  4. statistic-learning-R-note Public

    📒统计学习方法-李航: 笔记-从原理到实现, 200-page PDF Notes with detailed explanations of various math formulas, implemented in R.🎉

    Shell 445 55

  5. torchlm Public

    💎A high level pipeline for face landmarks detection: train, eval, inference (Python/C++) and 100+ data augmentations.

    Python 257 24

  6. ffpa-attn-mma Public

    📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

    Cuda 171 7

Repositories

Showing 10 of 23 repositories
  • LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

    Cuda 4,006 GPL-3.0 431 3 0 Updated May 6, 2025
  • Awesome-LLM-Inference Public

    📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

    Python 3,957 GPL-3.0 276 2 0 Updated May 6, 2025
  • Awesome-Diffusion-Inference Public

    📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

    223 GPL-3.0 13 0 0 Updated May 5, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TRT.

    C++ 4,072 GPL-3.0 739 1 0 Updated Apr 28, 2025
  • statistic-learning-R-note Public

    📒统计学习方法-李航: 笔记-从原理到实现, 200-page PDF Notes with detailed explanations of various math formulas, implemented in R.🎉

    Shell 445 GPL-3.0 55 2 0 Updated Apr 26, 2025
  • xlite-cli Public

    The cli version of lite.ai.toolkit

    C++ 1 0 0 0 Updated Apr 9, 2025
  • ffpa-attn-mma Public

    📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

    Cuda 171 GPL-3.0 7 3 0 Updated Apr 6, 2025
  • hgemm-tensorcores-mma Public

    ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

    Cuda 75 GPL-3.0 3 0 0 Updated Mar 30, 2025
  • .github Public
    1 0 0 0 Updated Mar 30, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    Cuda 0 Apache-2.0 103 0 0 Updated Mar 23, 2025