Skip to content
Change the repository type filter

All

    Repositories list

    • TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
      C++
      1.7k001Updated Aug 16, 2025Aug 16, 2025
    • kilocode

      Public
      Open Source AI coding assistant for planning, building, and fixing code. We're a superset of Roo, Cline, and our own features. Follow us: kilocode.ai/social
      TypeScript
      672000Updated Aug 15, 2025Aug 15, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      9.4k000Updated Aug 11, 2025Aug 11, 2025
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      2.6k000Updated Aug 11, 2025Aug 11, 2025
    • ocr-tools

      Public
      Python
      1410Updated Aug 2, 2025Aug 2, 2025
    • 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
      Python
      30k000Updated Jul 29, 2025Jul 29, 2025
    • olmocr

      Public
      Toolkit for linearizing PDFs for LLM datasets/training
      Python
      1k000Updated Jul 9, 2025Jul 9, 2025
    • Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
      Python
      548000Updated May 28, 2025May 28, 2025
    • The Triton TensorRT-LLM Backend
      Python
      129000Updated May 8, 2025May 8, 2025
    • Sample Next.js ai chat app using Deep Infra inference and Vercel ai sdk
      TypeScript
      2100Updated Mar 17, 2025Mar 17, 2025
    • cutlass

      Public
      CUDA Templates for Linear Algebra Subroutines
      C++
      1.4k000Updated Mar 15, 2025Mar 15, 2025
    • Fast and memory-efficient exact attention
      Python
      1.9k000Updated Feb 20, 2025Feb 20, 2025
    • Zonos

      Public
      Python
      784000Updated Feb 12, 2025Feb 12, 2025
    • Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
      Python
      298000Updated Oct 21, 2024Oct 21, 2024
    • Model components of the Llama Stack APIs
      Python
      1.1k000Updated Oct 10, 2024Oct 10, 2024
    • Secure your NGINX locations with JWT
      Shell
      131000Updated Jun 17, 2024Jun 17, 2024
    • deepctl

      Public
      Command line tool for Deep Infra cloud ML inference service
      Rust
      33220Updated Jun 10, 2024Jun 10, 2024
    • 🦜🔗 Build context-aware reasoning applications 🦜🔗
      TypeScript
      2.7k000Updated May 31, 2024May 31, 2024
    • Official TypeScript wrapper for DeepInfra Inference API
      TypeScript
      31452Updated May 13, 2024May 13, 2024
    • A framework for few-shot evaluation of language models.
      Python
      2.6k000Updated Apr 29, 2024Apr 29, 2024
    • langchain

      Public
      ⚡ Building applications with LLMs through composability ⚡
      Python
      19k100Updated Jan 22, 2024Jan 22, 2024
    • litellm

      Public
      Call all LLM APIs using the OpenAI format. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLMs)
      Python
      3.8k000Updated Jan 8, 2024Jan 8, 2024
    • Large Language Model Text Generation Inference
      Python
      1.2k906Updated Dec 15, 2023Dec 15, 2023
    • fetch-stream
      JavaScript
      1000Updated Nov 6, 2023Nov 6, 2023
    • A better API for making Event Source requests, with all the features of fetch()
      TypeScript
      170000Updated Aug 18, 2023Aug 18, 2023
    • cog

      Public
      Containers for machine learning
      Go
      627000Updated Aug 1, 2023Aug 1, 2023
    • A cog for running llama-2 using llama.cpp server
      Python
      0000Updated Aug 1, 2023Aug 1, 2023
    • NVIDIA GPU-based FAN controller for SUPERMICRO server
      Python
      3000Updated Apr 25, 2023Apr 25, 2023
    • Multilingual Automatic Speech Recognition with word-level timestamps and confidence
      Python
      194000Updated Mar 7, 2023Mar 7, 2023
    • Multilingual Sentence & Image Embeddings with BERT
      Python
      2.7k000Updated Feb 28, 2023Feb 28, 2023