This project investigates and mitigates language bias in clinical notes, focusing on how machine learning models can detect and perpetuate implicit biases—especially racial and gender-related—in medical documentation. Inspired by recent works such as "Write It Like You See It" and "Hurtful Words", our approach combines the strengths of BERT-based language models for bias detection and GPT-based models for text debiasing.
This was developed as part of the NLP_AIML course project.
Machine learning models trained on clinical data often absorb and reinforce systemic biases. For example, even when race is redacted, models can infer it from implicit textual cues, leading to unfair clinical recommendations.
Our goal is twofold:
- Detect whether a piece of clinical text contains latent bias using transformer-based models.
- Debias these texts to ensure fairer downstream predictions using autoregressive LLMs (GPT-style).
- Python 3.9+
- Transformers (HuggingFace)
- PyTorch
- BERT (ClinicalBERT, SciBERT)
- GPT-3.5 / GPT-4 API (for generative debiasing)
- scikit-learn (for evaluation and metrics)
- spaCy / NLTK (for preprocessing)
- Clinical notes sourced from MIMIC-III and Columbia University Medical Center datasets.
- Notes redacted for explicit race indicators (e.g., "Black", "White", "Caucasian").
- Fine-tuned models (Logistic Regression, XGBoost, ClinicalBERT, SciBERT) classify whether notes imply a specific racial identity.
- Achieved high AUC even on redacted notes—proving models can learn implicit bias.
- Compared model predictions to a panel of physicians—who performed no better than random, emphasizing the non-obvious nature of bias.
- Prompts crafted for GPT-based models to rewrite biased clinical notes while preserving clinical meaning.
- Target: reduce associations between demographic proxies and clinical descriptions.
- Detection Accuracy: ClinicalBERT ensemble reached AUC ~0.83.
- Bias Examples:
- Words like "bruising", "paleness", or "family support" skew predictions racially.
- Post-Debiasing Evaluation:
- Reduced racial/gender associations.
- Maintained core clinical content as verified by cosine similarity and manual checks.
bias-clinical-nlp/
│
├── data/ # Raw and redacted clinical notes
├── models/ # Fine-tuned BERT and XGBoost classifiers
├── debiasing/
│ └── gpt_rewriter.py # Prompts GPT to generate unbiased text
├── notebooks/
│ └── bias_detection.ipynb
│ └── debiasing_pipeline.ipynb
├── utils/
│ └── preprocessing.py # Tokenization, redaction, filtering
│
├── requirements.txt
└── README.md
- Install dependencies
pip install -r requirements.txt
- Run detection pipeline
python notebooks/bias_detection.ipynb
- Run GPT-based debiasing
python debiasing/gpt_rewriter.py --input data/redacted_notes.txt
- Evaluate debiased output
python notebooks/debiasing_pipeline.ipynb
- Adam, H. et al. Write It Like You See It: Detectable Differences in Clinical Notes by Race Lead To Differential Model Recommendations (AIES 2022)
- Zhang, H. et al. Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings (CHIL 2020)
- All patient data was de-identified.
- GPT outputs were carefully reviewed to avoid clinical misinterpretation.
- This work does not replace clinical judgment and is meant to highlight algorithmic bias, not to automate care.
- Explore RLHF for better control in text debiasing.
- Extend bias detection beyond race (e.g., gender, insurance status, language).
- Integrate into clinical decision support tools with real-time debiasing layers.
MIT License © 2025 Your Name