Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data
Xun Zhu, Fanbin Mo, Zheng Zhang, Jiaxi Wang, Yiming Shi, Ming Wu, Chuang Zhang, Miao Li, Ji Wu
【Accepted】by The 33rd ACM International Conference on Multimedia (ACM MM 2025)
Dataset
Dataset statistics
IMAX comprises a total of 47,600 unique X-rays and 354,595 data entries, distributed as follows: 100,901 for VQA, 54,684 for calculation, 51,045 for REC, 51,045 for REG, 45,715 for report generation, 45,186 for multi-label classification, and 6,019 for multi-class classification. We partition IMAX into train and test sets with a ratio of 4:1, resulting in 38,077 images and 284,017 data entries allocated for training.
DMAX average: 1) 1.25 tasks per image; 2) 2.09 train data entries per image.
IMAX average: 1) 4.10 tasks per image; 2) 7.46 train data entries per image.