MedM-VL: What Makes a Good Medical LVLM?

MedM-VL is a modular, LLaVA-based codebase for medical LVLMs, supporting flexible customization of encoders, connectors, and LLMs.

MedM-VL focuses on small-scale medical LVLMs, designed for direct deployment in real-world medical scenarios or efficient fine-tuning on downstream tasks.

📰 News

[2025.04.10]: The model weights (v1.0) have been uploaded to Hugging Face.
[2025.04.06]: The technical report has been released on arXiv.
- [2504.04323] MedM-VL: What Makes a Good Medical LVLM?
[2024.12.19]: The complete code has been released on GitHub.

✨ Features

MedM-VL (v1.0: single image input, more details on Hugging Face)

shiym2000/MedM-VL-2D-3B-en · Hugging Face: Trained on 2D medical images and English medical texts.
shiym2000/MedM-VL-CT-Chest-3B-en · Hugging Face: Trained on 3D chest CT volumes and English medical texts.

📦 Installation

# 1. clone and navigate
git clone https://github.com/MSIIP/MedM-VL.git
cd MedM-VL

# 2. create a conda environment, activate it and install packages
conda create -n medm python=3.10
conda activate medm
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

🚀 Getting Started

If you are confused about some parameters during usage, please refer to Parameter Interpretation.

1. Train a general medical LVLM from scratch

# For 2D medical LVLMs
# 1. pre-train (annotation format: docs/example_2d_pretrain.json)
bash scripts/train/MedM-VL-2D/pretrain_en.sh
# 2. fine-tune (annotation format: docs/example_2d_finetune.json)
bash scripts/train/MedM-VL-2D/finetune_en.sh

# For 3D medical LVLMs
# 1. pre-train (annotation format: docs/example_3d_pretrain.json)
bash scripts/train/MedM-VL-CT-Chest/pretrain_en.sh
# 2. fine-tune (annotation format: docs/example_3d_finetune.json)
bash scripts/train/MedM-VL-CT-Chest/finetune_en.sh

# In fact, there is no difference in the annotation file format between
# pre-training and fine-tuning. The former is from image-text pairs
# while the latter refers to instruction tuning data.

2. Fine-tune a specialized medical LVLM with pre-trained weights

# For 2D medical LVLMs
# 1. download weights from Hugging Face
pip install -U huggingface_hub
huggingface-cli download --resume-download shiym2000/MedM-VL-2D-3B-en --local-dir work_dirs/MedM-VL-2D-3B-en
# 2. fine-tune using LoRA (annotation format: docs/example_2d_finetune.json)
bash scripts/train/finetune_2d.sh

# For 3D medical LVLMs
# 1. download weights from Hugging Face
pip install -U huggingface_hub
huggingface-cli download --resume-download shiym2000/MedM-VL-CT-Chest-3B-en --local-dir work_dirs/MedM-VL-CT-Chest-3B-en
# 2. fine-tune using LoRA (annotation format: docs/example_3d_finetune.json)
bash scripts/train/finetune_3d.sh

# You can choose full or LoRA fine-tuning based on available GPU memory.

3. Inference

# For 2D medical LVLMs
# inference (annotation format: docs/example_2d_inference.json)
bash scripts/eval/inference_2d.sh

# For 3D medical LVLMs
# inference (annotation format: docs/example_3d_inference.json)
bash scripts/eval/inference_3d.sh

# Compared to `finetune.json``, `conversations` in `inference.json` lacks
# the final response, which will be generated by the model.

4. Demo

# Launch a Gradio demo locally.
bash scripts/playground.sh

🤖 Model Zoo

Encoder	Connector	LLM
CLIP (2021) SigLIP (2023) M3D-CLIP (2023) MedM-CLIP	MLP Spatial Pooling Attention Pooling	Phi-2 (2023) Phi-3 (2024) Qwen2.5 (2024) Llama-3.2 (2024)

📖 Citation

@article{shi2025medm,
  title={MedM-VL: What Makes a Good Medical LVLM?},
  author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Li, Miao and Wu, Ji},
  journal={arXiv preprint arXiv:2504.04323},
  year={2025}
}

❤️ Acknowledgements

We would like to express our gratitude to the following resources:

TinyLLaVA_Factory - An open-source modular codebase for small-scale large multimodal models (LMMs).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
docs		docs
lvlm		lvlm
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MedM-VL: What Makes a Good Medical LVLM?

📰 News

✨ Features

📦 Installation

🚀 Getting Started

1. Train a general medical LVLM from scratch

2. Fine-tune a specialized medical LVLM with pre-trained weights

3. Inference

4. Demo

🤖 Model Zoo

📖 Citation

❤️ Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

Turbo-AGI/MedM-VL

Folders and files

Latest commit

History

Repository files navigation

MedM-VL: What Makes a Good Medical LVLM?

📰 News

✨ Features

📦 Installation

🚀 Getting Started

1. Train a general medical LVLM from scratch

2. Fine-tune a specialized medical LVLM with pre-trained weights

3. Inference

4. Demo

🤖 Model Zoo

📖 Citation

❤️ Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages