Med-PRM: Medical Reasoning Models with Step-wise Guideline-verified Process Rewards

News

[June/15/2025] 📰 Med-PRM preprint is now available on arXiv.
[June/15/2025] 🎉 We’re excited to present Med-PRM — a new medical process reward model that augments retrieval and is the first 8B-parameter framework to exceed 80% accuracy on MedQA.

📖 Overview

MED-PRM is a novel framework designed to enhance clinical decision-making by addressing errors in the reasoning process. It leverages retrieval-augmented generation (RAG) to verify each reasoning step against established medical knowledge bases, ensuring accuracy and reliability in medical diagnoses. MED-PRM is not intended to replace the expertise of healthcare professionals but to augment it, allowing them to focus on critical thinking and patient care while automating the verification of reasoning steps.

Scoreboard

Policy Model	Reward Model	Policy Base Model	Reward Base Model	Policy Training Method	Reward Training Method	MedQA-4
Llama-3.1-8B-Instruct	Med PRM Reward v1.0	Llama 3.1 8B IT	Llama 3.1 8B IT	-	SFT	78.24
Med PRM Policy v1.0	Med PRM Reward v1.0	Llama 3.1 8B IT	Llama 3.1 8B IT	Rejection Sampling	SFT	79.18
Llama-3-8B-UltraMedical	Med PRM Reward v1.0	Llama 3.0 8B IT	Llama 3.1 8B IT	SFT	SFT	79.87
llama-3-meerkat-8b-v1.0	Med PRM Reward v1.0	Llama 3.0 8B IT	Llama 3.1 8B IT	SFT	SFT	80.35

🔄 Experiment

Prepare Data: Execute the data preparation script:
```
python python/0_preparing.py
```
Score with PRM: Run a quick test for Med-PRM (test set was sampled by Llama-3.1-8B-Instruct):
```
bash scripts/4_scoring_PRM.sh
```
Train the Model: If desired, train the model (data already downloaded in step 5):
```
bash scripts/2_training.sh
```

Contact

Feel free to reach out to jhyun0414@hanyang.ac.kr or jisohn@ethz.ch

BibTeX Citation: If you use Med-PRM in your research, please cite it using the following BibTeX entry:

@misc{medprm2025,
  title={Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards}, 
  author={Jaehoon Yun and Jiwoong Sohn and Jungwoo Park and Hyunjae Kim and Xiangru Tang and Daniel Shao and Yong Hoe Koo and Ko Minhyeok and Qingyu Chen and Mark Gerstein and Michael Moor and Jaewoo Kang},
  author+an = {1=first; 2=first; 3=first; 11=last; 12=last},
  year={2025},
  url={https://med-prm.github.io/},
  eprint={2506.11474},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dataset		dataset
media		media
python		python
scripts		scripts
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Med-PRM: Medical Reasoning Models with Step-wise Guideline-verified Process Rewards

News

📖 Overview

Scoreboard

🔄 Experiment

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

eth-medical-ai-lab/Med-PRM

Folders and files

Latest commit

History

Repository files navigation

Med-PRM: Medical Reasoning Models with Step-wise Guideline-verified Process Rewards

News

📖 Overview

Scoreboard

🔄 Experiment

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages