Skip to content
Change the repository type filter

All

    Repositories list

    • FP-Quant

      Public
      Python
      02311Updated Aug 9, 2025Aug 9, 2025
    • EvoPress

      Public
      Python
      22600Updated Jul 30, 2025Jul 30, 2025
    • qutlass

      Public
      QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
      C++
      26210Updated Jul 15, 2025Jul 15, 2025
    • Python
      0900Updated Jun 30, 2025Jun 30, 2025
    • QuEST

      Public
      Work in progress.
      Jupyter Notebook
      67020Updated Jun 29, 2025Jun 29, 2025
    • Quartet

      Public
      Jupyter Notebook
      67640Updated Jun 27, 2025Jun 27, 2025
    • Example of YOLOv8 pose detection (estimation) on browser. It shows implementations powered by ONNX and TFJS served through JavaScript without any frameworks. It demonstrates pose detection (estimation) on image as well as live web camera,
      HTML
      3000Updated Jun 13, 2025Jun 13, 2025
    • MoE-Quant

      Public
      Code for data-aware compression of DeepSeek models
      Python
      64220Updated Jun 10, 2025Jun 10, 2025
    • Official implementation of Influence Distillation: https://www.arxiv.org/abs/2505.19051
      Python
      0310Updated May 29, 2025May 29, 2025
    • PanzaMail

      Public
      Python
      1929246Updated Apr 8, 2025Apr 8, 2025
    • HALO-anon

      Public
      0000Updated Apr 1, 2025Apr 1, 2025
    • torch_cgx

      Public
      Pytorch distributed backend extension with compression support
      C++
      01640Updated Mar 24, 2025Mar 24, 2025
    • gemm-int8

      Public
      High Performance Int8 GEMM Kernels for SM80 and later GPUs.
      Python
      0900Updated Mar 11, 2025Mar 11, 2025
    • DarwinLM

      Public
      Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"
      Python
      31600Updated Feb 21, 2025Feb 21, 2025
    • Official Repository for "Scalable Mechanistic Neural Networks" (ICLR 2025)
      Python
      0200Updated Feb 19, 2025Feb 19, 2025
    • SPADE

      Public
      Code of SPADE: Sparsity Guided Debugging for Deep Neural Networks
      Jupyter Notebook
      3110Updated Feb 18, 2025Feb 18, 2025
    • HALO

      Public
      HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arxiv.org/abs/2501.02625
      Python
      01810Updated Feb 17, 2025Feb 17, 2025
    • gemm-fp8

      Public
      High Performance FP8 GEMM Kernels for SM89 and later GPUs.
      Cuda
      11600Updated Jan 24, 2025Jan 24, 2025
    • GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.
      Python
      0200Updated Jan 23, 2025Jan 23, 2025
    • MicroAdam

      Public
      This repository contains code for the MicroAdam paper.
      Python
      41910Updated Dec 14, 2024Dec 14, 2024
    • LLM training code for Databricks foundation models
      Python
      575001Updated Nov 27, 2024Nov 27, 2024
    • Python
      0200Updated Nov 25, 2024Nov 25, 2024
    • 0000Updated Nov 20, 2024Nov 20, 2024
    • LDAdam

      Public
      LDAdam - Adaptive Optimization from Low-Dimensional Gradient Statistics
      Python
      1700Updated Nov 6, 2024Nov 6, 2024
    • Boosting 4-bit inference kernels with 2:4 Sparsity
      Cuda
      58011Updated Sep 4, 2024Sep 4, 2024
    • marlin

      Public
      FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
      Python
      72873295Updated Sep 4, 2024Sep 4, 2024
    • sparsegpt

      Public
      Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
      Python
      109822180Updated Aug 20, 2024Aug 20, 2024
    • peft-rosa

      Public
      A fork of the PEFT library, supporting Robust Adaptation (RoSA)
      Python
      31410Updated Aug 16, 2024Aug 16, 2024
    • Python
      0000Updated Jun 27, 2024Jun 27, 2024
    • spops

      Public
      C++
      0820Updated Jun 20, 2024Jun 20, 2024