Skip to content

som-shahlab/opt-paradox

Repository files navigation

🔬 Optimization Paradox in Multi-Agent Systems

This repo contains code for the "The Optimization Paradox in Clinical AI Multi-Agent Systems" paper. It demonstrates how optimizing individual components can catastrophically undermine overall system performance in multi-agent clinical AI systems. The framework enables evaluation of both single-agent and multi-agent workflows on real patient cases from the MIMIC-CDM dataset using multiple LLM families.

It currently supports 8 different LLM families and provides comprehensive evaluation metrics including diagnostic accuracy, process adherence, and cost efficiency.

📖 Table of Contents

  1. 🚀 Quick Start
  2. 📊 What This Does
  3. 🏥 Key Finding
  4. 📈 Results & Evaluation
  5. 🔧 Supported Models
  6. 📋 Requirements
  7. 📚 Citation
  8. 📧 Issues

🚀 Quick Start

  1. Install dependencies
conda env create -f environment.yaml
conda activate clinagent_env
  1. Configure APIs
cp config.example.yaml config.yaml
# Edit config.yaml with your API keys
  1. Run evaluation
# Single agent
python3 run_single_agent.py --model_id_main gpt --dataset_type val

# Multi-agent 
python3 run_multi_agent.py --model_id_info gemini --model_id_diagnosis gpt --dataset_type val

📊 What This Does

Tests clinical reasoning on 2,400 real patient cases across 4 abdominal conditions:

  • Single-agent: One model handles everything
  • Multi-agent: Specialized models for information gathering, interpretation, and diagnosis
  • Best-of-Breed: Top-performing components combined (spoiler: performs worst!)

🏥 Key Finding

The Best-of-Breed system built from individually optimal components achieved only 67.7% accuracy vs 77.4% for a well-integrated multi-agent system, despite superior process metrics.

📈 Results & Evaluation

python3 run_evals.py --log_dir logs/<experiment_name>

Results include diagnostic accuracy, process adherence, and cost metrics.

🔧 Supported Models

Azure OpenAI, Claude, Gemini, Llama, o3-mini, DeepSeek

📋 Requirements

📚 Citation

(Placeholder for future publication citation.)


📧 Issues

Please report issues by creating an issue on this GitHub repository.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages