LLM Performance in ESI Triage Classification

This repository contains the code and resources for a study investigating the effectiveness of a low-cost prompt engineering strategy for Emergency Severity Index (ESI) classification using Large Language Models (LLMs).

The goal of this project is to provide a transparent and reproducible framework for evaluating the performance of LLMs in clinical triage tasks, with a rigorous focus on preventing data leakage to simulate a realistic decision-making scenario.

📜 About the Project

Overcrowding in emergency departments is a global challenge. This study explores how prompt engineering, a low-cost alternative to fine-tuning, can be used to guide LLMs to achieve high performance in patient classification, aligning with the clinical reasoning of experts.

The main script (llm-esi-triage.py) runs an experiment on a validated subset of the MIETIC dataset, generates ESI predictions, and produces a complete set of performance reports and artifacts for analysis.

📊 Results Summary

Our experiments demonstrate strong performance using prompt engineering for ESI classification:

Model	Accuracy	Quadratic Kappa	F1-Score (Weighted)
GPT-4.1	86.1%	0.948	0.857
GPT-5	88.9%	0.961	0.890

Results based on 36 validated cases from the MIETIC dataset.

Key Findings:

Both models achieved high quadratic kappa scores (>0.94), indicating excellent agreement with expert classifications
GPT-5 shows modest improvement over GPT-4.1 across all metrics
Perfect classification achieved for ESI-4 and ESI-5 categories with GPT-5

📄 Paper Draft

A draft of the accompanying paper for this research, "High-Performance Emergency Triage Classification Using Cost-Effective Prompt Engineering," is available for viewing. As a work-in-progress, feedback is welcome.

Read the Paper Draft on Google Docs

✨ Key Features

Methodological Rigor: Implements a comprehensive exclusion list to prevent data leakage of clinical outcomes, ensuring a fair evaluation.
Full Reproducibility: Uses random seeds and saves all configurations, metrics, and prompts in metadata files for each run.
Comprehensive Reporting: Generates multiple artifacts for each experiment, including detailed logs, raw predictions, metrics in JSON format, a human-readable summary report, and a confusion matrix visualization.
Robust Code: Structured with software best practices, including dataclass configuration, professional logging, and error handling.

📊 Dataset

This experiment uses the MIMIC-IV-Ext Triage Instruction Corpus (MIETIC), publicly available on the PhysioNet platform.

Source: https://physionet.org/content/mietic/1.0.0/
File Used: MIETIC-validate-samples.csv

The script automatically filters the dataset to use only the 36 cases where the Final Decision was validated as 'RETAIN' by experts, ensuring a high-quality ground truth.

🚀 Getting Started

Follow the steps below to replicate the experiment.

✅ Prerequisites

Python 3.8 or higher
An OpenAI account with API access

⚙️ Installation

Clone the repository:

git clone https://github.com/DiegoZoracKy/research-llm-esi-triage.git
cd research-llm-esi-triage

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the dependencies:
```
pip install -r requirements.txt
```
Set up your OpenAI API Key: The script requires the OPENAI_API_KEY to be set as an environment variable.
```
export OPENAI_API_KEY="your_key_here"
```

▶️ Running the Experiment

With everything set up, run the script from your terminal:

python llm-esi-triage.py

The script will create a new directory inside the results/ folder for each run, containing all the generated artifacts.

📂 Understanding the Results

For each run, a new folder will be created, for example: results/20250730_231450_gpt-4.1_ESI_vv4.5/. Inside it, you will find:

predictions.csv: The most detailed file, with a row for each patient, including the exact prompt sent, the raw LLM response, the extracted prediction, and the actual value.
metrics.json: All performance metrics (accuracy, Kappa, F-1 score, etc.) in a structured JSON format.
metadata.json: The complete "recipe" for the experiment, including the configuration used and the prompt templates.
summary_report.txt: A human-readable summary of the results, ideal for a quick analysis.
confusion_matrix.png: A visualization of the confusion matrix, ready to be used in presentations or the paper.
confusion_matrix.csv: The data for the confusion matrix in CSV format.
experiment.log: A detailed log of the entire script execution, useful for debugging.
error_cases.csv: If any errors occur, this file will contain the cases that failed, to facilitate analysis.

📄 License

This project is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Performance in ESI Triage Classification

📜 About the Project

📊 Results Summary

📄 Paper Draft

✨ Key Features

📊 Dataset

🚀 Getting Started

✅ Prerequisites

⚙️ Installation

▶️ Running the Experiment

📂 Understanding the Results

📄 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
results		results
MIETIC-validate-samples.csv		MIETIC-validate-samples.csv
README.md		README.md
llm-esi-triage.py		llm-esi-triage.py
requirements.txt		requirements.txt

DiegoZoracKy/research-llm-esi-triage

Folders and files

Latest commit

History

Repository files navigation

LLM Performance in ESI Triage Classification

📜 About the Project

📊 Results Summary

📄 Paper Draft

✨ Key Features

📊 Dataset

🚀 Getting Started

✅ Prerequisites

⚙️ Installation

▶️ Running the Experiment

📂 Understanding the Results

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages