Skip to content

Official repository of Expert-Controlled Classifier-Free Guidance for Reliable Medical Visual Question Answering.

Notifications You must be signed in to change notification settings

ecoxial2007/Expert-CFG

Repository files navigation

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models

💡Overview

Expert-Controlled Classifier-Free Guidance is a training-free expert-in-the-loop framework designed to align MedVLM with clinical expertise. It integrates token-level uncertainty estimation, a BioMedCLIP-based medical multimodal Retrieval-Augmented Generation (RAG), and interactive expert revisions and highlight-based guidance.

🔨Setup

🔨Installation

conda create -n expert_cfg python=3.10 -y
conda activate expert_cfg
pip install -r requirements.txt

🔨Pre-trained weights

Baseline Model:

Download them to the current directory separately and merge them with Phi-3-vision-128k-instruct and Phi-3.5-vision-instruct respectively.

Medical LoRA:

Our fine-tuning Phi3V-Med and Phi3.5V-Med LoRA links:

Demo

torchrun --nproc_per_node=1 demo.py \
    --bf16 \
    --use_lora \
    --input_json 'examples/input_queries.json' \
    --img_root 'examples/images' \
    --save_path 'examples/results.json' \
    --output_dir './lora_weights/logs_phi35_pubmed_instruct' 

Medical Image & Test Encoder for RAG(optional):

Download BiomedCLIP and place it in ./src/backbone/BiomedCLIP.

BiomedCLIP links:

Note: Directly downloading weights from Huggingface might encounter network issues. To facilitate modifications, we have converted the original .bin file to PyTorch's .pth. We recommend using the Baiduyun version.

📑Data Preparation

Our data mainly comes from publicly available, free online Pathology Education Informational Resource (PEIR) Digital Library. We test our model on:

Medical Alignment and Instruction Tuning:

Prepare BiomedCLIP Pre-extracted Image Feature

Note: We recommend using our pre-extracted BioMedCLIP features. The original images can also be found in the links below:

Dataset Pre-extracted Features & Original Images
PEIR Baiduyun, Rename zzz2zip
PEIR BioMedCLIP features & keyword & GPT3.5 rewrite caption Baiduyun
PathVQA Baiduyun
Slake Baiduyun
RADVQA Baiduyun

📝Acknowledgements

We also reference the excellent repos of Phi-3CookBook, HuatuoVision, BioMedCLIP, in addition to other specific repos to the baseline and dataset we examined (see paper).

📝Citation

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@misc{liang2025uncertaintydriven,
    title={Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models},
    author={Xiao Liang and Di Wang and Zhicheng Jiao and Ronghan Li and Pengfei Yang and Quan Wang and Tat-Seng Chua},
    year={2025},
    eprint={2507.09209},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

About

Official repository of Expert-Controlled Classifier-Free Guidance for Reliable Medical Visual Question Answering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages