Mengxuan Wu
(*) equal contribution, (
ArXiv Preprint (arXiv 2504.04787)
2025-04-07: We release the code of Dynamic Vision Mamba (DyVM).
2025-04-08: Paper released on arXiv.
Mamba-based vision models have gained extensive attention as a result of being computationally more efficient than attention-based models. However, spatial redundancy still exists in these models, represented by token and block redundancy. For token redundancy, we analytically find that early token pruning methods will result in inconsistency between training and inference or introduce extra computation for inference. Therefore, we customize token pruning to fit the Mamba structure by rearranging the pruned sequence before feeding it into the next Mamba block. For block redundancy, we allow each image to select SSM blocks dynamically based on an empirical observation that the inference speed of Mamba-based vision models is largely affected by the number of SSM blocks. Our proposed method, Dynamic Vision Mamba (DyVM), effectively reduces FLOPs with minor performance drops. We achieve a reduction of 35.2% FLOPs with only a loss of accuracy of 1.7% on Vim-S. It also generalizes well across different Mamba vision model architectures and different vision tasks.
We use the same environment as the original Vim codebase.
# Python 3.10.13
conda create -n dyvm python=3.10.13
conda activate dyvm
# torch 2.1.1 + cu118
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
# Requirements: vim_requirements.txt
pip install -r vim_requirements.txt
# Install causal_conv1d and mamba
pip install -e causal_conv1d>=1.1.0
pip install -e mamba-1p1p1
To be noticed, you must install the of mamba
under the mamba-1p1p1
directory, since we have made some modifications to the original mamba
codebase.
If you cannot install the mamba
we provided, you can first install the original mamba
from the release. Then, replace the mamba-ssm
directory in your conda environment with the mamba-ssm
directory in the mamba-1p1p1
directory.
rm -rf /path/to/your/conda/env/dyvm/lib/python3.10/site-packages/mamba-ssm/
cp -r mamba-1p1p1/mamba-ssm /path/to/your/conda/env/dyvm/lib/python3.10/site-packages/
We provide all training scripts used in our experiments in vim/scripts/
. Change the --data-path
to the path of your ImageNet dataset and run the script to train your model. An example is:
cd vim
bash scripts/vim-t.sh
By adding --eval
flag in the training script and change the --model-path
to the path of your trained model, you can evaluate the model directly after training. An example of evaluation script is provided in vim/scripts/eval.sh
.
cd vim
bash scripts/eval.sh
We provide the code of each ablation study in the Albatation
directory. You can follow the scripts in their vim/scripts/
directory to reproduce the results.
Model | #Param. | Top-1 Acc. | Download Link |
---|---|---|---|
DyVM-Vim-S | 27M | 78.8 | Google Drive |
DyVM-Vim-B | 101M | 79.9 | Google Drive |
This project is based on Vim (paper, code), Mamba (paper, code), Causal-Conv1d (code), DeiT (paper, code). Thanks for their wonderful works.
If you find DyVM is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@misc{wu2025dynamicvisionmamba,
title={Dynamic Vision Mamba},
author={Mengxuan Wu and Zekai Li and Zhiyuan Liang and Moyang Li and Xuanlei Zhao and Samir Khaki and Zheng Zhu and Xiaojiang Peng and Konstantinos N. Plataniotis and Kai Wang and Wangbo Zhao and Yang You},
year={2025},
eprint={2504.04787},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.04787},
}