Skip to content

som-shahlab/med-nota

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Med-NOTA: Evaluating "None of the other Answers" in Medical QA

A simple tool for analyzing how well language models handle "None of the other Answers" (NOTA) options in medical question answering, especially under Chain-of-Thought (CoT) reasoning.


πŸ“Œ What This Does

This project investigates whether large language models (LLMs) like GPT, Claude, Deepseek-R1, and others can reliably identify when none of the answer choices are correct in medical multiple-choice questions. It compares performance with and without the need to recognize NOTA.


πŸš€ Quick Start

1. Set up your environment

conda env create -f environment.yaml
conda activate cot-eval

2. Configure your API key

Before running any experiments, add your API key to the config file at:

scripts/config.py

Then, add the model endpoints at:

scripts/src/medqa_nato.py

3. Process the data

cd scripts/data
python3 load_data.py

4. Run the NOTA experiments

cd ../src
python3 medqa_nato.py

5. Analyze the results

python3 nota_accuracy_stats.py

πŸ“Š What the Analysis Shows

  • βœ… Accuracy comparisons between regular CoT and NOTA conditions
  • πŸ“ˆ Confidence intervals for model performance
  • πŸ§ͺ P-values for statistical significance testing
  • πŸ” Question-level insights: which questions showed the biggest drops in accuracy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages