- [June/15/2025] 📰 Med-PRM preprint is now available on arXiv.
- [June/15/2025] 🎉 We’re excited to present Med-PRM — a new medical process reward model that augments retrieval and is the first 8B-parameter framework to exceed 80% accuracy on MedQA.
- MED-PRM is a novel framework designed to enhance clinical decision-making by addressing errors in the reasoning process. It leverages retrieval-augmented generation (RAG) to verify each reasoning step against established medical knowledge bases, ensuring accuracy and reliability in medical diagnoses. MED-PRM is not intended to replace the expertise of healthcare professionals but to augment it, allowing them to focus on critical thinking and patient care while automating the verification of reasoning steps.
Policy Model | Reward Model | Policy Base Model | Reward Base Model | Policy Training Method | Reward Training Method | MedQA-4 |
---|---|---|---|---|---|---|
Llama-3.1-8B-Instruct | Med PRM Reward v1.0 | Llama 3.1 8B IT | Llama 3.1 8B IT | - | SFT | 78.24 |
Med PRM Policy v1.0 | Med PRM Reward v1.0 | Llama 3.1 8B IT | Llama 3.1 8B IT | Rejection Sampling | SFT | 79.18 |
Llama-3-8B-UltraMedical | Med PRM Reward v1.0 | Llama 3.0 8B IT | Llama 3.1 8B IT | SFT | SFT | 79.87 |
llama-3-meerkat-8b-v1.0 | Med PRM Reward v1.0 | Llama 3.0 8B IT | Llama 3.1 8B IT | SFT | SFT | 80.35 |
-
Prepare Data: Execute the data preparation script:
python python/0_preparing.py
-
Score with PRM: Run a quick test for Med-PRM (test set was sampled by Llama-3.1-8B-Instruct):
bash scripts/4_scoring_PRM.sh
-
Train the Model: If desired, train the model (data already downloaded in step 5):
bash scripts/2_training.sh
Feel free to reach out to jhyun0414@hanyang.ac.kr or jisohn@ethz.ch
BibTeX Citation: If you use Med-PRM in your research, please cite it using the following BibTeX entry:
@misc{medprm2025,
title={Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards},
author={Jaehoon Yun and Jiwoong Sohn and Jungwoo Park and Hyunjae Kim and Xiangru Tang and Daniel Shao and Yong Hoe Koo and Ko Minhyeok and Qingyu Chen and Mark Gerstein and Michael Moor and Jaewoo Kang},
author+an = {1=first; 2=first; 3=first; 11=last; 12=last},
year={2025},
url={https://med-prm.github.io/},
eprint={2506.11474},
archivePrefix={arXiv},
primaryClass={cs.CL}
}