This repository contains expert-annotated hallucination labels from the ACI-bench dataset for evaluating hallucination detection in medical text summarization.
The Natural Hallucination (NH) dataset contains expert annotations of hallucinations in clinical summaries, focused on SOAP notes from the ACI-bench collection of clinical conversations.
Expert clinical scribes annotated statements into 4 categories with the following distribution:
- No Error: 12,365
- Hallucination: 106
- Inference: 87
- Misunderstanding: 72
The errors were classified by severity:
- Low Severity: 138
- High Severity: 87
- Not Medically Relevant (NMR): 40
The following categories are marked as high severity errors:
- Diagnosis
- Exam Findings
- Lab Testing and Imaging
- Medical History
- Symptoms
- Treatment Plan
Age & Sex errors are considered low severity.
The released dataset contains:
- Original ACI-bench conversation transcripts
- Expert annotations of factual errors marked by category
- Severity labels for each error
- Aggregated error scores per subject
The annotations can be used to:
- Evaluate hallucination detection methods
- Analyze different types of factual errors in clinical summarization
- Study high vs low severity errors in medical text generation
If you use this dataset, please cite:
Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization. BN, S., Shing, H.-C., Xu, L., Strong, M., Burnsky, J., Ofor, J., Mason, J. R., Chen, S., Srinivasan, S., Shivade, C., Moriarty, J., & Cohen, J. P. Interspeech 2025
@inproceedings{BN2024fact,
title={Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization},
author={BN, Suhas and Shing, Han-Chin and Xu, Lei and Strong, Mitch and Burnsky, Jon and Ofor, Jessica and Mason, Jordan R and Chen, Susan and Srinivasan, Sundararajan and Shivade, Chaitanya and Moriarty, Jack and Cohen, Joseph Paul},
booktitle={Interspeech},
year={2025},
organization={ISCA}
}
This release contains only the expert annotations on the ACI Bench summaries. The LLM outputs could not be made public due to license issues.