MixedGemm

MixedGemm is a mixed-precision GEMM with quantize and reorder kernel performed on Blackwell GPUs(RTX5090).

We use CUTLASS to perform the mxfp4, mxfp6, mxfp8 GEMM.

In this branch, we quantized Weight and Activation to mxfp4, mxfp6 and mxfp8 to achieve best performance with tolerant accuracy loss.

CUDA TOOLKIT 12.8.1 is required.

Installation

If you do not have CUDA TOOLKIT 12.8.1, please refer to this, make sure you are on RTX50 Series or other BlackWell GPUs
Clone this repo and CUTLASS (Make sure you install Git, and Conda)

git clone https://github.com/actypedef/MixedGemm.git
git clone https://github.com/NVIDIA/cutlass.git
cd MixedGemm
git switch samedtype

Prepare environment

sudo apt-get update
sudo apt-get install python3-dev

curl -s https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
sudo apt update
sudo apt install cmake

conda create -n mixedgemm python=3.10
conda activate mixedgemm
conda install pybind11
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Replace following paths in CMakeLists.txt with your actual paths

CMAKE_PREFIX_PATH
torch_python PATHS
PYTHON_ROOT
CUTLASS_ROOT

Make and run

bash remake.sh
python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
benchmark		benchmark
img		img
include		include
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
activate.py		activate.py
main.py		main.py
makerun.sh		makerun.sh
norm.py		norm.py
remake.sh		remake.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MixedGemm

Installation

About

Uh oh!

Releases

Packages

Languages

actypedef/MixedGemm

Folders and files

Latest commit

History

Repository files navigation

MixedGemm

Installation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages