Skip to content

DiegoZoracKy/research-llm-esi-triage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Performance in ESI Triage Classification

This repository contains the code and resources for a study investigating the effectiveness of a low-cost prompt engineering strategy for Emergency Severity Index (ESI) classification using Large Language Models (LLMs).

The goal of this project is to provide a transparent and reproducible framework for evaluating the performance of LLMs in clinical triage tasks, with a rigorous focus on preventing data leakage to simulate a realistic decision-making scenario.

📜 About the Project

Overcrowding in emergency departments is a global challenge. This study explores how prompt engineering, a low-cost alternative to fine-tuning, can be used to guide LLMs to achieve high performance in patient classification, aligning with the clinical reasoning of experts.

The main script (llm-esi-triage.py) runs an experiment on a validated subset of the MIETIC dataset, generates ESI predictions, and produces a complete set of performance reports and artifacts for analysis.

📊 Results Summary

Our experiments demonstrate strong performance using prompt engineering for ESI classification:

Model Accuracy Quadratic Kappa F1-Score (Weighted)
GPT-4.1 86.1% 0.948 0.857
GPT-5 88.9% 0.961 0.890

Results based on 36 validated cases from the MIETIC dataset.

Key Findings:

  • Both models achieved high quadratic kappa scores (>0.94), indicating excellent agreement with expert classifications
  • GPT-5 shows modest improvement over GPT-4.1 across all metrics
  • Perfect classification achieved for ESI-4 and ESI-5 categories with GPT-5

📄 Paper Draft

A draft of the accompanying paper for this research, "High-Performance Emergency Triage Classification Using Cost-Effective Prompt Engineering," is available for viewing. As a work-in-progress, feedback is welcome.

Read the Paper Draft on Google Docs

✨ Key Features

  • Methodological Rigor: Implements a comprehensive exclusion list to prevent data leakage of clinical outcomes, ensuring a fair evaluation.
  • Full Reproducibility: Uses random seeds and saves all configurations, metrics, and prompts in metadata files for each run.
  • Comprehensive Reporting: Generates multiple artifacts for each experiment, including detailed logs, raw predictions, metrics in JSON format, a human-readable summary report, and a confusion matrix visualization.
  • Robust Code: Structured with software best practices, including dataclass configuration, professional logging, and error handling.

📊 Dataset

This experiment uses the MIMIC-IV-Ext Triage Instruction Corpus (MIETIC), publicly available on the PhysioNet platform.

The script automatically filters the dataset to use only the 36 cases where the Final Decision was validated as 'RETAIN' by experts, ensuring a high-quality ground truth.

🚀 Getting Started

Follow the steps below to replicate the experiment.

✅ Prerequisites

  • Python 3.8 or higher
  • An OpenAI account with API access

⚙️ Installation

  1. Clone the repository:

    git clone https://github.com/DiegoZoracKy/research-llm-esi-triage.git
    cd research-llm-esi-triage
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install the dependencies:

    pip install -r requirements.txt
  4. Set up your OpenAI API Key: The script requires the OPENAI_API_KEY to be set as an environment variable.

    export OPENAI_API_KEY="your_key_here"

▶️ Running the Experiment

With everything set up, run the script from your terminal:

python llm-esi-triage.py

The script will create a new directory inside the results/ folder for each run, containing all the generated artifacts.

📂 Understanding the Results

For each run, a new folder will be created, for example: results/20250730_231450_gpt-4.1_ESI_vv4.5/. Inside it, you will find:

  • predictions.csv: The most detailed file, with a row for each patient, including the exact prompt sent, the raw LLM response, the extracted prediction, and the actual value.
  • metrics.json: All performance metrics (accuracy, Kappa, F-1 score, etc.) in a structured JSON format.
  • metadata.json: The complete "recipe" for the experiment, including the configuration used and the prompt templates.
  • summary_report.txt: A human-readable summary of the results, ideal for a quick analysis.
  • confusion_matrix.png: A visualization of the confusion matrix, ready to be used in presentations or the paper.
  • confusion_matrix.csv: The data for the confusion matrix in CSV format.
  • experiment.log: A detailed log of the entire script execution, useful for debugging.
  • error_cases.csv: If any errors occur, this file will contain the cases that failed, to facilitate analysis.

📄 License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

Research on ESI triage classification using LLMs and cost-effective prompt engineering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages