Skip to content

actypedef/MixedGemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MixedGemm

MixedGemm is a mixed-precision GEMM with quantize and reorder kernel performed on Blackwell GPUs(RTX5090).

We use CUTLASS to perform the mxfp4, mxfp6, mxfp8 GEMM.

In this branch, we quantized Weight and Activation to mxfp4, mxfp6 and mxfp8 to achieve best performance with tolerant accuracy loss.

CUDA TOOLKIT 12.8.1 is required.

Installation

  1. If you do not have CUDA TOOLKIT 12.8.1, please refer to this, make sure you are on RTX50 Series or other BlackWell GPUs

  2. Clone this repo and CUTLASS (Make sure you install Git, and Conda)

git clone https://github.com/actypedef/MixedGemm.git
git clone https://github.com/NVIDIA/cutlass.git
cd MixedGemm
git switch samedtype
  1. Prepare environment
sudo apt-get update
sudo apt-get install python3-dev

curl -s https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
sudo apt update
sudo apt install cmake

conda create -n mixedgemm python=3.10
conda activate mixedgemm
conda install pybind11
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
  1. Replace following paths in CMakeLists.txt with your actual paths
CMAKE_PREFIX_PATH
torch_python PATHS
PYTHON_ROOT
CUTLASS_ROOT
  1. Make and run
bash remake.sh
python main.py

About

a mixed-precision gemm with quantize and reorder kernel.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published