Skip to content

amazon-science/acibench-hallucination-annotations

Natural Hallucination Dataset - ACI-bench Clinical Note Hallucination Annotations

This repository contains expert-annotated hallucination labels from the ACI-bench dataset for evaluating hallucination detection in medical text summarization.

Dataset Overview

The Natural Hallucination (NH) dataset contains expert annotations of hallucinations in clinical summaries, focused on SOAP notes from the ACI-bench collection of clinical conversations.

Annotation Categories & Counts

Expert clinical scribes annotated statements into 4 categories with the following distribution:

  • No Error: 12,365
  • Hallucination: 106
  • Inference: 87
  • Misunderstanding: 72

Error Severity Distribution

The errors were classified by severity:

  • Low Severity: 138
  • High Severity: 87
  • Not Medically Relevant (NMR): 40

High Severity Categories

The following categories are marked as high severity errors:

  • Diagnosis
  • Exam Findings
  • Lab Testing and Imaging
  • Medical History
  • Symptoms
  • Treatment Plan

Age & Sex errors are considered low severity.

Dataset Format

The released dataset contains:

  • Original ACI-bench conversation transcripts
  • Expert annotations of factual errors marked by category
  • Severity labels for each error
  • Aggregated error scores per subject

Usage

The annotations can be used to:

  • Evaluate hallucination detection methods
  • Analyze different types of factual errors in clinical summarization
  • Study high vs low severity errors in medical text generation

Citation

If you use this dataset, please cite:

Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization. BN, S., Shing, H.-C., Xu, L., Strong, M., Burnsky, J., Ofor, J., Mason, J. R., Chen, S., Srinivasan, S., Shivade, C., Moriarty, J., & Cohen, J. P. Interspeech 2025

@inproceedings{BN2024fact,
  title={Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization},
  author={BN, Suhas and Shing, Han-Chin and Xu, Lei and Strong, Mitch and Burnsky, Jon and Ofor, Jessica and Mason, Jordan R and Chen, Susan and Srinivasan, Sundararajan and Shivade, Chaitanya and  Moriarty, Jack and Cohen, Joseph Paul},
  booktitle={Interspeech},
  year={2025},
  organization={ISCA}
}

Note

This release contains only the expert annotations on the ACI Bench summaries. The LLM outputs could not be made public due to license issues.

About

Expert hallucination labels from the ACI-bench dataset

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •