We introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation, which is capable of generating accurate diagnostic findings and simultaneously segmenting the corresponding biomedical targets. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which can effectively unify diverse biomedical tasks in universal training for advancing grounded interpretation.
git clone https://github.com/Luffy03/UniBiomed
cd UniBiomed
conda create -n UniBiomed python=3.10
conda activate UniBiomed
conda install pytorch==2.3.1 torchvision==0.18.1 pytorch-cuda=12.1 cuda -c pytorch -c "nvidia/label/cuda-12.1.0" -c "nvidia/label/cuda-12.1.1"
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.3/index.html
pip install -r requirements.txt
You need to download sam2-hiera-large in the 'pretrained' path.
./ # project root
pretrained/
├── sam2_hiera_large.pt
Our curated datasets are available at Hugging face. Some of the datasets should be downloaded and processed from the original links. The datasets are organized as follows:
./ # project root
data/Biomed
├── CoCaHis
├──train
├──train_mask
├──test
├──test_mask
├──train.json
├──test.json
├── 3D
├── CHAOS
├── ...
├── MedTrinity
├── MSD
├── ...
Quick start to use our model. A demo script is available at example.py and some examples are placed in './examples'.
import argparse
import torch
from transformers import (AutoModel, AutoTokenizer,
BitsAndBytesConfig, CLIPImageProcessor,
GenerationConfig)
def parse_args():
parser = argparse.ArgumentParser(description='UniBiomed')
parser.add_argument('--model_path', default='Luffy503/UniBiomed')
return args
args = parse_args()
# load model
model = AutoModel.from_pretrained(
args.model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True,
).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
args.model_path,
trust_remote_code=True,
)
# define data input, image and text instruction
data_dict = {}
image, text = None, None
data_dict['image'] = image
data_dict['text'] = text
# output
pred_dict = model.predict_forward(**data_dict, tokenizer=tokenizer)
# text description
prediction = pred_dict['prediction']
# segmentation mask
mask = pred_dict['prediction_masks'][0][0]
Run the following command for training (8*H800 GPUs).
bash tools/dist.sh train projects/unibiomed/configs/biomed.py 8
After training, you need to save hugging face model for evaluation. Replace '$your_model$' as the real model path. The model will be saved to './save_hf'.
PYTHONPATH=. python projects/unibiomed/hf/convert_to_hf.py projects/unibiomed/configs/biomed.py --pth-model ./work_dirs/biomed/$your_model$.pth --save-path ./save_hf
You can use our trained model on hugging face for evaluation.
For segmentation (replace '$datasetname'):
PYTHONPATH=. python demo/demo_seg2D.py --val_folder /data/Biomed/$datasetname --work-dir ./val_results/$datasetname --model_path Luffy503/UniBiomed
For grounded disease recognition:
PYTHONPATH=. python demo/demo_disease.py --data_path ./data/Biomed/Disease/$datasetname --model_path Luffy503/UniBiomed --save_dir ./val_results/Grounded_disease/$datasetname
# eval metrics
python demo/eval_utils/metrics_grounded_disease.py --root ./data/Biomed/Disease/$datasetname --prediction_dir_path ./val_results/Grounded_disease/$datasetname
# or one for all
bash demo_disease.sh
For Region understand:
PYTHONPATH=. python demo/demo_RegionCap.py --data_path ./data/Biomed/Disease/$datasetname --model_path Luffy503/UniBiomed --save_dir ./val_results/region_understand/$datasetname
For medtrinity report generation:
PYTHONPATH=. python demo/demo_Medtrinity.py --model_path Luffy503/UniBiomed
# eval metrics
python demo/eval_utils/metrics_medtrinity.py --root ./data/Biomed/MedTrinity --gt_json_path train.json --prediction_dir_path ./val_results/MedTrinity
For radgenome report generation:
PYTHONPATH=. python demo/demo_GRG.py --model_path Luffy503/UniBiomed --save_dir ./val_results/Grounded_Report_Generation/RadGenome
# eval metrics
python demo/eval_utils/metrics_grg.py --root ./data/Biomed/RadGenome --prediction_dir_path ./val_results/Grounded_Report_Generation/RadGenome
Our work is developed on the great work Sa2VA. We highly appreciate their great efforts. We also thanks RadGenome, BiomedParse, VoCo, and MedTrinity for providing data preprocessing toolkits.
If you find this repo useful for your research, please consider citing the paper as follows:
@article{wu2025unibiomed,
title={UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation},
author={Wu, Linshan and Nie, Yuxiang and He, Sunan and Zhuang, Jiaxin and Luo, Luyang and Mahboobani, Neeraj and Vardhanabhuti, Varut and Chan, Ronald Cheong Kin and Peng, Yifan and Rajpurkar, Pranav and Chen, Hao},
journal={arXiv preprint arXiv:2504.21336},
year={2025}
}