Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 24 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,20 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,

### 📢 Updates

#### [May 12, 2025]

- **Another benchmark!** We now support running the [`WMDP`](https://wmdp.ai/) benchmark with its `Zephyr` task model.
- **More evaluations!** The [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) toolkit has been integrated into OpenUnlearning, enabling WMDP evaluations and support for popular general LLM benchmarks, including MMLU, GSM8K, and others.

<details>
<summary><b>Older Updates</b></summary>

#### [Apr 6, 2025]
🚨🚨 **IMPORTANT:** 🚨🚨 Be sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.
- **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and Exact Memorization (EM) as additional evaluation metrics.
- **More TOFU Evaluations!** Now includes a holdout set and supports MIA attack-based evaluation. You can now compute MUSE's privleak on TOFU.
- **More Documentation!** [`docs/links.md`](docs/links.md) contains resources for each of the implemented features and other useful LLM unlearning resources.


<details>
<summary><b>Older Updates</b></summary>
Be sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.

#### [Mar 27, 2025]
- **More Documentation: easy contributions and the leaderboard functionality**: We've updated the documentation to make contributing new unlearning methods and benchmarks much easier. Users can document additions better and also update a leaderboard with their results. See [this section](#-how-to-contribute) for details.
Expand All @@ -56,11 +61,11 @@ We provide several variants for each of the components in the unlearning pipelin

| **Component** | **Available Options** |
|------------------------|----------------------|
| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/) |
| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |
| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU |
| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks |
| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks, [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits) |
| **Model Families** | TOFU: LLaMA-3.2, LLaMA-3.1, LLaMA-2; MUSE: LLaMA-2; Additional: Phi-3.5, Phi-1.5, Gemma |
| **Model Families** | TOFU: LLaMA-3.2, LLaMA-3.1, LLaMA-2; MUSE: LLaMA-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |

---

Expand Down Expand Up @@ -89,13 +94,15 @@ We provide several variants for each of the components in the unlearning pipelin
# Environment setup
conda create -n unlearning python=3.11
conda activate unlearning
pip install .
pip install .[lm_eval]
pip install --no-build-isolation flash-attn==2.6.3

# Data setup
python setup_data.py # saves/eval now contains evaluation results of the uploaded models
# Downloads log files with metric eval results (incl retain model logs) from the models
# used in the supported benchmarks.
python setup_data.py --eval # saves/eval now contains evaluation results of the uploaded models
# This downloads log files with evaluation results (including retain model logs)
# into `saves/eval`, used for evaluating unlearning across supported benchmarks.
# Additional datasets (e.g., WMDP) are supported — run below for options:
# python setup_data.py --help
```

---
Expand Down Expand Up @@ -202,14 +209,13 @@ If you use OpenUnlearning in your research, please cite OpenUnlearning and the b
booktitle={First Conference on Language Modeling},
year={2024}
}
@article{shi2024muse,
title={MUSE: Machine Unlearning Six-Way Evaluation for Language Models},
@inproceedings{
shi2025muse,
title={{MUSE}: Machine Unlearning Six-Way Evaluation for Language Models},
author={Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang},
year={2024},
eprint={2407.06460},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.06460},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=TArmA033BU}
}
```
</details>
Expand Down
9 changes: 9 additions & 0 deletions configs/data/datasets/WMDP_forget.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
WMDP_forget:
handler: PretrainingDataset
args:
hf_args:
path: "text"
data_files: "data/wmdp/wmdp-corpora/cyber-forget-corpus.jsonl"
split: "train"
text_key: "text"
max_length: 512
9 changes: 9 additions & 0 deletions configs/data/datasets/WMDP_retain.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
WMDP_retain:
handler: PretrainingDataset
args:
hf_args:
path: "text"
data_files: "data/wmdp/wmdp-corpora/cyber-retain-corpus.jsonl"
split: "train"
text_key: "text"
max_length: 512
20 changes: 20 additions & 0 deletions configs/eval/lm_eval.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# @package eval.lm_eval
# NOTE: the above line is not a comment, but sets the package for config. See https://hydra.cc/docs/upgrades/0.11_to_1.0/adding_a_package_directive/

handler: LMEvalEvaluator
output_dir: ${paths.output_dir} # set to default eval directory
overwrite: false

# Define evaluation tasks here
tasks:
- mmlu
# - task: gsm8k
# dataset_path: gsm8k
# # define the entire task config.
# # ^ Example: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k.yaml


simple_evaluate_args:
batch_size: 16
system_instruction: null
apply_chat_template: false
1 change: 1 addition & 0 deletions configs/eval/muse.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ defaults:
# - mia_reference
# - mia_zlib
# - mia_gradnorm
# - forget_gibberish

handler: MUSEEvaluator
output_dir: ${paths.output_dir} # set to default eval directory
Expand Down
20 changes: 20 additions & 0 deletions configs/eval/muse_metrics/forget_gibberish.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# @package eval.muse.metrics.forget_gibberish
defaults:
- .@pre_compute.forget_verbmem_ROUGE: forget_verbmem_ROUGE

pre_compute:
forget_verbmem_ROUGE:
access_key: text

handler: classifier_prob
batch_size: 32
max_length: 512
class_id: 0
text_key: generation
device: cuda

classifier_model_args:
pretrained_model_name_or_path: "madhurjindal/autonlp-Gibberish-Detector-492513457"

classifier_tokenization_args:
pretrained_model_name_or_path: "madhurjindal/autonlp-Gibberish-Detector-492513457"
1 change: 1 addition & 0 deletions configs/eval/tofu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ defaults: # include all defined metrics files
# - mia_zlib
# - mia_gradnorm
# - mia_reference # set reference model path appropriately
# - forget_Q_A_gibberish

handler: TOFUEvaluator
output_dir: ${paths.output_dir} # set to default eval directory
Expand Down
20 changes: 20 additions & 0 deletions configs/eval/tofu_metrics/forget_Q_A_gibberish.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# @package eval.tofu.metrics.forget_Q_A_gibberish
defaults:
- .@pre_compute.forget_Q_A_ROUGE: forget_Q_A_ROUGE

pre_compute:
forget_Q_A_ROUGE:
access_key: text

handler: classifier_prob
batch_size: 32
max_length: 512
class_id: 0
text_key: generation
device: cuda

classifier_model_args:
pretrained_model_name_or_path: "madhurjindal/autonlp-Gibberish-Detector-492513457"

classifier_tokenization_args:
pretrained_model_name_or_path: "madhurjindal/autonlp-Gibberish-Detector-492513457"
15 changes: 15 additions & 0 deletions configs/experiment/eval/wmdp/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# @package _global_

defaults:
- override /model: zephyr-7b-beta
- override /eval: lm_eval

data_split: cyber

eval:
lm_eval:
tasks:
- wmdp_${data_split}
- mmlu

task_name: ???
58 changes: 58 additions & 0 deletions configs/experiment/unlearn/wmdp/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# @package _global_

defaults:
- override /model: zephyr-7b-beta
- override /trainer: RMU
- override /data: unlearn
- override /data/datasets@data.forget: WMDP_forget
- override /data/datasets@data.retain: WMDP_retain
- override /eval: lm_eval

data_split: cyber

data:
anchor: forget
forget:
WMDP_forget:
args:
hf_args:
data_files: data/wmdp/wmdp-corpora/${data_split}-forget-corpus.jsonl
retain:
WMDP_retain:
args:
hf_args:
data_files: data/wmdp/wmdp-corpora/${data_split}-retain-corpus.jsonl

eval:
lm_eval:
tasks:
- wmdp_${data_split}
- mmlu


collator:
DataCollatorForSupervisedDataset:
args:
padding_side: left # Usually left but for mistral and zephyr its right (https://github.com/hongshi97/CAD/issues/2)

trainer:
args:
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
learning_rate: 5e-5
eval_strategy: steps
eval_steps: 0.5
max_steps: 80
lr_scheduler_type: constant

method_args:
# The params here are more dependent on model and dataset. Tune them carefully to work
gamma: 1.0
steering_coeff: 2
retain_loss_type: EMBED_DIFF
alpha: 1
module_regex: model\.layers\.7
trainable_params_regex:
- model\.layers\.(5|6|7)\.mlp\.down_proj\.weight # If you want to update only these weights (as done in https://github.com/centerforaisafety/wmdp/blob/bc5e1ba0367ea826caeeeaa50656336a1e87acfb/rmu/unlearn.py#L26)

task_name: ???
15 changes: 15 additions & 0 deletions configs/model/zephyr-7b-beta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
model_args:
pretrained_model_name_or_path: "HuggingFaceH4/zephyr-7b-beta"
attn_implementation: 'flash_attention_2'
torch_dtype: bfloat16
tokenizer_args:
pretrained_model_name_or_path: "HuggingFaceH4/zephyr-7b-beta"
template_args:
apply_chat_template: True
system_prompt: You are a helpful assistant.
system_prompt_with_special_tokens: "<|system|>\nYou are a helpful assistant.</s>\n"
user_start_tag: "<|user|>\n"
user_end_tag: "</s>"
asst_start_tag: "<|assistant|>\n"
asst_end_tag: "</s>"
date_string: 10 Apr 2025
30 changes: 30 additions & 0 deletions docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,3 +240,33 @@ metrics: {} # lists a mapping from each evaluation metric listed above to its co
output_dir: ${paths.output_dir} # set to default eval directory
forget_split: forget10
```

## lm-evaluation-harness

To evaluate model capabilities after unlearning, we support running [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) using our custom evaluator: [LMEvalEvaluator](../src/evals/lm_eval.py).
All evaluation tasks should be defined under the `tasks` in [lm_eval.yaml](../configs/eval/lm_eval.yaml)

```yaml
# @package eval.lm_eval
# NOTE: the above line is not a comment, but sets the package for config. See https://hydra.cc/docs/upgrades/0.11_to_1.0/adding_a_package_directive/

handler: LMEvalEvaluator
output_dir: ${paths.output_dir} # set to default eval directory
overwrite: false

# Define evaluation tasks here
tasks:
- mmlu
- wmdp_cyber
- task: gsm8k
dataset_path: gsm8k
# define the entire task config.
# ^ Example: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k.yaml



simple_evaluate_args:
batch_size: 16
system_instruction: null
apply_chat_template: false
```
16 changes: 10 additions & 6 deletions docs/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@ Links to research papers and resources corresponding to implemented features in
---

## 📌 Table of Contents
- [Implemented Methods](#implemented-methods)
- [Benchmarks](#benchmarks)
- [Evaluation Metrics](#evaluation-metrics)
- [Useful Links](#useful-links)
- [Survey Papers](#survey-papers)
- [Other GitHub Repositories](#other-github-repositories)
- [🔗 Links and References](#-links-and-references)
- [📌 Table of Contents](#-table-of-contents)
- [📗 Implemented Methods](#-implemented-methods)
- [📘 Benchmarks](#-benchmarks)
- [📙 Evaluation Metrics](#-evaluation-metrics)
- [🌐 Useful Links](#-useful-links)
- [📚 Surveys](#-surveys)
- [🐙 Other GitHub Repositories](#-other-github-repositories)

---

Expand All @@ -32,6 +34,7 @@ Links to research papers and resources corresponding to implemented features in
|-----------|----------|
| TOFU | Paper [📄](https://arxiv.org/abs/2401.06121) |
| MUSE | Paper [📄](https://arxiv.org/abs/2407.06460) |
| WMDP | Paper [📄](https://arxiv.org/abs/2403.03218) |

---

Expand All @@ -45,6 +48,7 @@ Links to research papers and resources corresponding to implemented features in
| Forget Quality, Truth Ratio, Model Utility | TOFU ([📄](https://arxiv.org/abs/2401.06121)) |
| Extraction Strength (ES) | Carlini et al., 2021 ([📄](https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting)), used for unlearning in Wang et al., 2025 ([📄](https://openreview.net/pdf?id=wUtCieKuQU)) |
| Exact Memorization (EM) | Tirumala et al., 2022 ([📄](https://proceedings.neurips.cc/paper_files/paper/2022/hash/fa0509f4dab6807e2cb465715bf2d249-Abstract-Conference.html)), used for unlearning in Wang et al., 2025 ([📄](https://openreview.net/pdf?id=wUtCieKuQU)) |
| lm-evaluation-harness | [💻](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) |

---

Expand Down
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
packages=find_packages(),
install_requires=requirements, # Uses requirements.txt
extras_require={
"lm-eval": [
"lm-eval==0.4.8",
], # Install using `pip install .[lm-eval]`
"dev": [
"pre-commit==4.0.1",
"ruff==0.6.9",
Expand Down
Loading