locuslab · molereddy · Apr 7, 2025 · Mar 1, 2025 · Mar 2, 2025 · Mar 2, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -6,4 +6,4 @@ repos:
     -   id: ruff
         args: [check, --fix, scripts, src, setup.py, setup_data.py]
     -   id: ruff
-        args: [format, scripts, src, setup.py setup_data.py]
+        args: [format, --check, scripts, src, setup.py setup_data.py]
diff --git a/README.md b/README.md
@@ -21,18 +21,37 @@
 
 ## 📖 Overview
 
-We provide efficient and streamlined implementations of the TOFU, MUSE unlearning benchmarks while supporting 6 unlearning methods, 3+ datasets, 6+ evaluation metrics, and 6+ LLM architectures. Each of these can be easily extended to incorporate more variants.
+We provide efficient and streamlined implementations of the TOFU, MUSE unlearning benchmarks while supporting 6 unlearning methods, 3+ datasets, 9+ evaluation metrics, and 6+ LLM architectures. Each of these can be easily extended to incorporate more variants.
 
 We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.
 
+---
+
 ### 📢 Updates
 
-#### [Mar 27, 2025]  
-- **Easier contributions, leaderboard and reproducibility**: We've updated the documentation to make contributing new unlearning methods and benchmarks much easier. Users can document additions better and also update a leaderboard with their results. See [this section](#-how-to-contribute) for details.
+#### [Apr 6, 2025]
+- **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and  Exact Memorization (EM) as additional evaluation metrics.
+- **More TOFU Evaluations!** Now includes a holdout set and supports MIA attack-based evaluation. You can now compute MUSE's privleak on TOFU.
+- **More Documentation!** [`docs/links.md`](docs/links.md) contains resources for each of the implemented features and other useful LLM unlearning resources.
+
+
+<details>
+<summary><b>Older Updates</b></summary>
+
+#### [Mar 27, 2025]
+- **More Documentation: easy contributions and the leaderboard functionality**: We've updated the documentation to make contributing new unlearning methods and benchmarks much easier. Users can document additions better and also update a leaderboard with their results. See [this section](#-how-to-contribute) for details.
+
+#### [Mar 9, 2025]
+- **More Methods!** Added support for [RMU](https://arxiv.org/abs/2403.03218) (representation-engineering based unlearning).
 
 #### [Feb 27, 2025]  
 ⚠️ **Repository Update**: This repo replaces the original TOFU codebase at [`github.com/locuslab/tofu`](https://github.com/locuslab/tofu), which is no longer maintained.
 
+</details>
+
+
+---
+
 ## 🗃️ Available Components
 
 We provide several variants for each of the components in the unlearning pipeline.
@@ -41,7 +60,7 @@ We provide several variants for each of the components in the unlearning pipelin
 |------------------------|----------------------|
 | **Benchmarks**        | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/) |
 | **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU |
-| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, QA-ROUGE, MIA Attacks, TruthRatio, Model Utility |
+| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks |
 | **Datasets**          | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits) |
 | **Model Families**    | TOFU: LLaMA-3.2, LLaMA-3.1, LLaMA-2; MUSE: LLaMA-2; Additional: Phi-3.5, Phi-1.5, Gemma |
 
@@ -77,14 +96,15 @@ pip install --no-build-isolation flash-attn==2.6.3
 
 # data setup
 python setup_data.py  # saves/eval now contains evaluation results of the uploaded models
-# Downloads log files with metric eval results (incl retain model logs) from the models used in the supported benchmarks.
+# Downloads log files with metric eval results (incl retain model logs) from the models 
+# used in the supported benchmarks.
 ```
 
 ---
 
 ### 🔄 Updated TOFU benchmark
 
-We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include LLaMA 3.2 1B, LLaMA 3.2 3B, LLaMA 3.1 8B, and the original LLaMA-2 7B from [the old version of TOFU](github.com/locuslab/tofu). 
+We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include LLaMA 3.2 1B, LLaMA 3.2 3B, LLaMA 3.1 8B, and the original LLaMA-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu). 
 
 For each architecture, we have finetuned with four different splits of the TOFU datasets: `full`, `retain90`, `retain95`, `retain99`, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on [HuggingFace](`https://huggingface.co/collections/open-unlearning/tofu-new-models-67bcf636334ea81727573a9f0`) and the paths to these models can be set in the experimental configs or in command-line overrides.
 
@@ -112,15 +132,18 @@ python src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default \
 An example command for launching a TOFU evaluation process on `forget10` split:
 
 ```bash
+model=Llama-3.2-1B-Instruct
 python src/eval.py --config-name=eval.yaml experiment=eval/tofu/default \
-  model=Llama-3.2-1B-Instruct \
-  model.model_args.pretrained_model_name_or_path=open-unlearning/tofu_Llama-3.2-1B-Instruct_full \
+  model=${model} \
+  model.model_args.pretrained_model_name_or_path=open-unlearning/tofu_${model}_full \
+  retain_logs_path=saves/eval/tofu_${model}_retain90/TOFU_EVAL.json \
   task_name=SAMPLE_EVAL
 ```
 
 - `experiment`- Path to the evaluation configuration [`configs/experiment/eval/tofu/default.yaml`](configs/experiment/eval/tofu/default.yaml).
 - `model`- Sets up the model and tokenizer configs for the `Llama-3.2-1B-Instruct` model.
 - `model.model_args.pretrained_model_name_or_path`- Overrides the default experiment config to evaluate a model from a HuggingFace ID (can use a local model checkpoint path as well).
+- `retain_logs_path`- Sets the path to the reference model eval logs that is needed to compute reference model based metrics like `forget_quality` in TOFU.
 
 For more details about creating and running evaluations, refer [`docs/evaluation.md`](docs/evaluation.md).
 
@@ -153,7 +176,8 @@ For more in-depth information on specific aspects of the framework, refer to the
 | [`docs/experiments.md`](docs/experiments.md)     | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
 | [`docs/hydra.md`](docs/hydra.md)                 | Explanation of the Hydra features used in configuration management for experiments.                                  |
 | [`community/leaderboard.md`](community/leaderboard.md)             | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks.              |
-| [`docs/repro.md`](docs/repro.md) (deprecated)            | Results are provided solely for reproducibility purposes, without any parameter tuning.             |
+| [`docs/links.md`](docs/links.md)             | List of all links to the research papers or other sources the implemented features are sourced from.              |
+| [`docs/repro.md`](docs/repro.md)            | Results are provided solely for reproducibility purposes, without any parameter tuning.             |
 ---
 
 ## 🔗 Support & Contributors
@@ -197,9 +221,15 @@ If you use OpenUnlearning in your research, please cite OpenUnlearning and the b
 ### 🤝 Acknowledgements
 
 - This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). 
-- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/jaechan-repo/muse_bench) benchmarks served as the foundation for our re-implementation. 
+- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation. 
 
 ---
 
 ### 📄 License
 This project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.
+
+---
+
+### Star History
+
+[![Star History Chart](https://api.star-history.com/svg?repos=locuslab/open-unlearning&type=Date)](https://www.star-history.com/#locuslab/open-unlearning&Date)
diff --git a/community/leaderboard.md b/community/leaderboard.md
@@ -8,48 +8,35 @@ We encourage the community to develop new methods, optimize them for specific be
 
 To implement a new method, refer to our [contributing guide](../docs/contributing.md).  
 
-> **Note:** The [results.md](../docs/results.md) file is maintained for reproducibility purposes. However, we encourage contributors to update the leaderboard table instead of the reproducibility table. We will continue refining and tuning baseline methods to keep the leaderboard up to date.
+> [!NOTE]
+> The [results.md](../docs/results.md) file is maintained for reproducibility purposes. However, we encourage contributors to update the leaderboard table instead of the reproducibility table. We will continue refining and tuning baseline methods to keep the leaderboard up to date.
 
 
-### TOFU unlearning on the `Llama-3.2-1B-Instruct` architecture
+### TOFU unlearning on the `Llama-2-7b-hf-chat` architecture
 
 <div style="overflow-x: auto; max-width: 100%;">
 <table class="dataframe">
   <thead>
     <tr>
       <th>Method</th>
-      <th style="text-align: center;" colspan="2" halign="left">forget01</th>
-      <th style="text-align: center;" colspan="2" halign="left">forget05</th>
       <th style="text-align: center;" colspan="2" halign="left">forget10</th>
     </tr>
     <tr>
       <th></th>
       <th>forget_quality</th>
       <th>model_utility</th>
-      <th>forget_quality</th>
-      <th>model_utility</th>
-      <th>forget_quality</th>
-      <th>model_utility</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th>Finetuned</th>
-      <td>0.01</td>
-      <td>0.60</td>
-      <td>2.96e-13</td>
-      <td>0.6</td>
-      <td>8.08e-22</td>
-      <td>0.6</td>
+      <td>4.35e-25</td>
+      <td>0.63</td>
     </tr>
     <tr>
       <th>Retain</th>
       <td>1.0</td>
-      <td>0.60</td>
-      <td>1.0</td>
-      <td>0.6</td>
-      <td>1.0</td>
-      <td>0.59</td>
+      <td>0.61</td>
     </tr>
     <tr>
       <td colspan="20"> </td>
@@ -70,37 +57,23 @@ To implement a new method, refer to our [contributing guide](../docs/contributin
   <thead>
     <tr>
       <th>Method</th>
-      <th style="text-align: center;" colspan="2" halign="left">forget01</th>
-      <th style="text-align: center;" colspan="2" halign="left">forget05</th>
       <th style="text-align: center;" colspan="2" halign="left">forget10</th>
     </tr>
     <tr>
       <th></th>
       <th>forget_quality</th>
       <th>model_utility</th>
-      <th>forget_quality</th>
-      <th>model_utility</th>
-      <th>forget_quality</th>
-      <th>model_utility</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th>Finetuned</th>
-      <td>0.01</td>
-      <td>0.60</td>
-      <td>2.96e-13</td>
-      <td>0.6</td>
-      <td>8.08e-22</td>
+      <td>1.66e-21</td>
       <td>0.6</td>
     </tr>
     <tr>
       <th>Retain</th>
       <td>1.0</td>
-      <td>0.60</td>
-      <td>1.0</td>
-      <td>0.6</td>
-      <td>1.0</td>
       <td>0.59</td>
     </tr>
     <tr>
@@ -143,7 +116,7 @@ To implement a new method, refer to our [contributing guide](../docs/contributin
       <td>0.64</td>
       <td>0.58</td>
       <td>-99.81</td>
-      <td>0.55</td>
+      <td>0.56</td>
       <td>0.47</td>
       <td>1.0</td>
       <td>-57.26</td>
@@ -152,7 +125,7 @@ To implement a new method, refer to our [contributing guide](../docs/contributin
     <tr>
       <th>Retain</th>
       <td>0.33</td>
-      <td>0.21</td>
+      <td>0.20</td>
       <td>0</td>
       <td>0.56</td>
       <td>0.3</td>

diff --git a/configs/data/datasets/MUSE_MIA.yaml b/configs/data/datasets/MUSE_MIA.yaml
@@ -0,0 +1,22 @@
+MUSE_MIA_holdout:
+  access_key: holdout
+  handler: CompletionDataset
+  args:
+    hf_args:
+      path: "muse-bench/MUSE-News"
+      name: "privleak"
+      split: "holdout"
+    prefix_key: "prompt" # doesn't exist in dataset
+    text_key: "text"
+    max_length: 2048
+MUSE_MIA_forget:
+  access_key: forget
+  handler: CompletionDataset
+  args:
+    hf_args:
+      path: "muse-bench/MUSE-News"
+      name: "privleak"
+      split: "forget"
+    prefix_key: "prompt" # doesn't exist in dataset
+    text_key: "text"
+    max_length: 2048
diff --git a/configs/data/datasets/MUSE_forget_privleak.yaml b/configs/data/datasets/MUSE_forget_privleak.yaml
diff --git a/configs/data/datasets/MUSE_holdout_privleak.yaml b/configs/data/datasets/MUSE_holdout_privleak.yaml
diff --git a/configs/data/datasets/MUSE_retain_privleak.yaml b/configs/data/datasets/MUSE_retain_privleak.yaml
diff --git a/configs/data/datasets/TOFU_MIA.yaml b/configs/data/datasets/TOFU_MIA.yaml
@@ -0,0 +1,22 @@
+TOFU_QA_forget:
+  access_key: forget
+  handler: QADataset
+  args:
+    hf_args:
+      name: "forget10"
+      split: "train"
+      path: "locuslab/TOFU"
+    question_key: "question"
+    answer_key: "answer"
+    max_length: 512
+TOFU_QA_holdout:
+  access_key: holdout
+  handler: QADataset
+  args:
+    hf_args:
+      name: "holdout10"
+      path: "locuslab/TOFU"
+      split: "train"
+    question_key: "question"
+    answer_key: "answer"
+    max_length: 512
diff --git a/configs/eval.yaml b/configs/eval.yaml
@@ -13,4 +13,5 @@ model:
     device_map: cuda
 
 mode: eval
-task_name: ???
+task_name: ???
+seed: 0
diff --git a/configs/eval/muse.yaml b/configs/eval/muse.yaml
@@ -7,6 +7,14 @@ defaults:
     - retain_knowmem_ROUGE
     - forget_verbmem_ROUGE
     - privleak
+    - extraction_strength
+    # - exact_memorization
+    # - mia_min_k_plus_plus
+    # - mia_min_k
+    # - mia_loss
+    # - mia_reference
+    # - mia_zlib
+    # - mia_gradnorm
 
 handler: MUSEEvaluator
 output_dir: ${paths.output_dir} # set to default eval directory

diff --git a/...se_metrics/forget_minKpc_neg_logprob.yaml → ...eval/muse_metrics/exact_memorization.yaml b/...se_metrics/forget_minKpc_neg_logprob.yaml → ...eval/muse_metrics/exact_memorization.yaml
@@ -1,13 +1,12 @@
-# @package eval.muse.metrics.forget_minKpc_neg_logprob
+# @package eval.muse.metrics.exact_memorization
 defaults:
-  - ../../data/datasets@datasets: MUSE_forget_privleak
+  - ../../data/datasets@datasets: MUSE_forget_verbmem
   - ../../collator@collators: DataCollatorForSupervisedDatasetwithIndex
-handler: minKpc_negative_logprob
-batch_size: 8
-percentile_K: 40
 
+handler: exact_memorization
+batch_size: 8
 datasets:
-  MUSE_forget_privleak:
+  MUSE_forget_verbmem:
     args:
       hf_args:
         path: muse-bench/MUSE-${eval.muse.data_split}
diff --git a/configs/eval/muse_metrics/extraction_strength.yaml b/configs/eval/muse_metrics/extraction_strength.yaml
@@ -0,0 +1,12 @@
+# @package eval.muse.metrics.extraction_strength
+defaults:
+  - ../../data/datasets@datasets: MUSE_forget_verbmem
+  - ../../collator@collators: DataCollatorForSupervisedDatasetwithIndex
+
+handler: extraction_strength
+batch_size: 8
+datasets:
+  MUSE_forget_verbmem:
+    args:
+      hf_args:
+        path: muse-bench/MUSE-${eval.muse.data_split}
diff --git a/configs/eval/muse_metrics/holdout_minKpc_neg_logprob.yaml b/configs/eval/muse_metrics/holdout_minKpc_neg_logprob.yaml