Source code for our ACL 2025 paper Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport.
This codebase implements various unlearning methods to make language models "forget" specific entities while preserving their general capabilities.
To install requirements:
conda create -n optout python=3.12.9
conda activate optout
pip install -r requirements.txt
We provide the ELUDe (Entity-Level Unlearning Dataset) on Hugging Face: https://huggingface.co/datasets/6rightjade/ELUDe
ELUDe is a comprehensive machine unlearning dataset focused on the removal of entire entities from large language models (LLMs). The dataset includes:
- 20 real-world target entities (the entities listed below)
- 144 unique neighboring entities from Wikipedia
The codebase supports unlearning for 20 different entities:
- Donald_Trump
- Elizabeth_II
- Barack_Obama
- Cristiano_Ronaldo
- Michael_Jackson
- Elon_Musk
- Lady_Gaga
- Adolf_Hitler
- Eminem
- Lionel_Messi
- Justin_Bieber
- Freddie_Mercury
- Kim_Kardashian
- Johnny_Depp
- Steve_Jobs
- Dwayne_Johnson
- Michael_Jordan
- Taylor_Swift
- Stephen_Hawking
- Kanye_West
original
- The original performance of the modelicu
- In-Context Unlearning: Prompting baseline (Guardrail)ga
- Gradient Ascent: Uses gradient ascent for unlearningdpo
- Direct Preference Optimization: Uses DPO for unlearningnpo
- Negative Preference Optimization: Uses NPO for unlearningidk
- I Don't Know: Makes the model respond with "I don't know"
You can combine core methods with the following modifiers (except original
and icu
):
+rt
- Retain Data: Includes neighboring entity data to preserve nearby knowledge+wd
- World Data: Uses Alpaca GPT-4 data for maintaining general knowledge (we use Alpaca GPT-4 data from here)+ot
- Optimal Transport: Adds Wasserstein regularization for better unlearning
npo+rt+wd+ot
- NPO with retain data, world data, and optimal transport (Opt-Out)dpo+rt+wd
- DPO with retain and world dataga+rt
- Gradient ascent with retain data onlyidk+wd
- IDK method with world data only
Use the training script to fine-tune models for entity unlearning:
bash scripts/train.sh
Run evaluation on trained models:
bash scripts/eval.sh
Opt-Out/
├── run.py # Main training/evaluation script
├── trainer.py # Custom trainer implementation
├── model.py # Model loading utilities
├── dataset.py # Data loading and processing
├── evaluator.py # Evaluation logic
├── scripts/ # Execution scripts
│ ├── train.sh # Training script
│ └── eval.sh # Evaluation script
├── data/ # External data
If you use this codebase, please cite our paper:
@article{choi2025optout,
title={Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport},
author={Choi, Minseok and Rim, Daniel and Lee, Dohyun and Choo, Jaegul},
journal={arXiv preprint arXiv:2406.12329},
year={2025}
}