Skip to content
Change the repository type filter

All

    Repositories list

    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      11k60k1.8k1.2kUpdated Oct 13, 2025Oct 13, 2025
    • Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      2522.1k5639Updated Oct 13, 2025Oct 13, 2025
    • Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      5112162Updated Oct 13, 2025Oct 13, 2025
    • Community maintained hardware plugin for vLLM on Spyre
      Python
      2635517Updated Oct 13, 2025Oct 13, 2025
    • Intelligent Mixture-of-Models Router for Efficient LLM Inference
      Go
      2011.7k8417Updated Oct 13, 2025Oct 13, 2025
    • The vLLM XPU kernels for Intel GPU
      C++
      14905Updated Oct 13, 2025Oct 13, 2025
    • Community maintained hardware plugin for vLLM on Ascend
      Python
      4801.2k559179Updated Oct 13, 2025Oct 13, 2025
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      4664.3k23024Updated Oct 13, 2025Oct 13, 2025
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      886208428Updated Oct 13, 2025Oct 13, 2025
    • A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      1058417Updated Oct 10, 2025Oct 10, 2025
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      4122017Updated Oct 10, 2025Oct 10, 2025
    • HTML
      292000Updated Oct 10, 2025Oct 10, 2025
    • vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      3041.8k8455Updated Oct 9, 2025Oct 9, 2025
    • Fast and memory-efficient exact attention
      Python
      2k96015Updated Oct 7, 2025Oct 7, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      5516145Updated Oct 6, 2025Oct 6, 2025
    • Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      01000Updated Oct 1, 2025Oct 1, 2025
    • FlashMLA

      Public
      C++
      881603Updated Sep 29, 2025Sep 29, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      711000Updated Sep 29, 2025Sep 29, 2025
    • Python
      72320Updated Aug 18, 2025Aug 18, 2025
    • rfcs

      Public
      0100Updated Jun 3, 2025Jun 3, 2025
    • HTML
      7801Updated Feb 7, 2025Feb 7, 2025
    • media-kit

      Public
      vLLM Logo Assets
      2600Updated Dec 12, 2024Dec 12, 2024
    • vllm-nccl

      Public archive
      Manages vllm-nccl dependency
      Python
      31720Updated Jun 3, 2024Jun 3, 2024
    • dashboard

      Public
      vLLM performance dashboard
      Python
      73600Updated Apr 26, 2024Apr 26, 2024