πΎ Dataset β’ π Paper β’ βοΈ Code & Docs
2,168 notes β’ 8,665 decompositions β’ 987,266 entailment pairs β’ 1,036 human-labeled examples
FactEHR is a benchmark dataset designed to evaluate the ability of large language models (LLMs) to perform factual reasoning over clinical notes. It includes:
- 2,168 deidentified notes from multiple publicly available datasets
- 8,665 LLM-generated fact decompositions
- 987,266 entailment pairs evaluating precision and recall of facts
- 1,036 expert-annotated examples for evaluation
If you use FactEHR in your research, please cite:
@article{munnangi2024factehr,
title = {Assessing the Limitations of Large Language Models in Clinical Fact Decomposition},
author = {Monica Munnangi and Akshay Swaminathan and Jason Alan Fries and Jenelle Jindal and Sanjana Narayanan and Ivan Lopez and Lucia Tu and Philip Chung and Jesutofunmi A. Omiye and Mehr Kashyap and Nigam Shah},
journal = {arXiv preprint arXiv:2412.12422},
year = {2024},
eprint = {2412.12422},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
note = {v1, 17 Dec 2024}
}