Offical code for ACM MM2025 paper MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization (Arxiv)
🔥🔥🔥 Welcome any PR or development for MQuant!
2025.08.07: 🔥🔥🔥 MQuant for Qwen2-VL has been released. Looking forward to your response!
2025.08.05: 🔥🔥🔥 MQuant for Intern-VL2 and MiniCPM-V has been released.
2025.08.04: 🔥🔥🔥 MQuant for Qwen-VL has been released.
2025.07.06: 🌟🌟🌟 MQuant has been accepted by ACM MM 2025. 🎉 Cheers!
- release the quantization code for Qwen2-VL
- release the quantization code for Intern-VL2, MiniCPM-V
- release the quantization code for Qwen-VL
- release the core code after the paper is accepted
- update acknowledgement
- release the paper link
- MQuant is the first quantization solution for Multimodal large language models applicable to 5 mainstream MLLMs.
- MQuant proposes the Modality-Specific Static Quantization (MSQ) to significantly reduce the Time-to-First-Token (TTFT) and Rotation Magnitude Suppression (RMS) to mitigate weight outliers.
- MQuant achieves near-floating-point accuracy (<1% degradation) while reducing inference latency by up to 30% on 5 mainstram MLLMs (Qwen-VL/Intern-VL/Qwen2-VL/GLM-4V/MiniCPM-V) under W4A8 setting.
Any questions or suggestions are welcome! Jiangyong Yu jiangyongyufocus@gmail.com, Sifan Zhou sifanjay@gmail.com, Dawei Yangdawei.yang@houmo.ai.
Our implementation is based on Quarot, GPTQ and VLMEvalKit. Thanks for the great open-source work!
If you think our paper or code is helpful, please consider citing our work.
@inproceedings{yu2025mquant,
title={MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization},
author={JiangYong Yu and Sifan Zhou and Dawei Yang and Shuo Wang and Shuoyu Li and Xing Hu and Chen Xu and Zukang Xu and Changyong Shu and Zhihang Yuan},
booktitle={Proceedings of the 33rd ACM international conference on multimedia (MM'25)},
year={2025}
}
MQuant is release under MIT license (see LICENSE).