⭐If this work is helpful for you, please help star this repo. Thanks!🤗
1️⃣ VAR exhibits scale and spatial redundancy, causing high GPU memory consumption.
2️⃣ The proposed method enables MVAR generation without relying on KV cache during inference.
- 2025-05-20: Our MVAR paper has been published on arXiv.
Our MVAR introduces the scale and spatial Markovian assumpation which only adopt adjacent preceding scale for next-scale prediction and restricts the attention of each token to a localized neighborhood of size k at corresponding positions on adjacent scales.
- 📄 Paper available on arXiv
- 🧠 Codebase under preparation
- 🚀 Planned improvements and model refinement
Our MVAR model achieves a 3.0× reduction in GPU memory footprint compared to VAR. Detailed results can be found in the paper.
Please cite us if our work is useful for your research.
@article{zhang2025mvar,
title={MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning},
author={Zhang, Jinhua and Long, Wei and Han, Minghao and You, Weiyi and Gu, Shuhang},
journal={arXiv preprint arXiv:2505.12742},
year={2025}
}
If you have any questions, feel free to approach me at jinhua.zjh@gmail.com