MedRegA, an interpretable bilingual generalist model for diverse biomedical tasks, represented by its outstanding ability to leverage regional information. MedRegA can perceive 8 modalities covering almost all the body parts, showcasing significant versatility.
💡We establish Region-Centric tasks with a large-scale dataset, MedRegInstruct, where each sample is paired with coordinates of body structures or lesions.
💡Based on the proposed dataset, we develop a Region-Aware medical MLLM, MedRegA, as a bilingual generalist medical AI system to perform both image-level and region-level medical vision-language tasks, demonstrating impressive versatility.
- Release the model.
- Release the demo code.
- Release the evaluation code.
- Release the training code.
- Release the data.
Please refer to InternVL Installation to build the environment.
Run the demo:
torchrun --nproc-per-node=1 src/demo.py
The training details can be referred to the slurm scripts for multi-node multi-gpu training. Before training, the dataset should be organized as MedRegInstruct and registered in meta_file.json
.
@inproceedings{
wang2025interpretable,
title={Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks},
author={Lehan Wang and Haonan Wang and Honglong Yang and Jiaji Mao and Zehong Yang and Jun Shen and Xiaomeng Li},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=YuHQTo6G9S}
}
We refer to the codes from InternVL. Thank the authors for releasing their code.