diff --git a/README.md b/README.md
index 3138535..989b16e 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@
 
 ## 📖 Overview
 
-We provide efficient and streamlined implementations of the TOFU, MUSE unlearning benchmarks while supporting 6 unlearning methods, 3+ datasets, 9+ evaluation metrics, and 6+ LLM architectures. Each of these can be easily extended to incorporate more variants.
+We provide efficient and streamlined implementations of the TOFU, MUSE, and WMDP unlearning benchmarks while supporting 6 unlearning methods, 3+ datasets, 9+ evaluation metrics, and 6+ LLM architectures. Each of these can be easily extended to incorporate more variants.
 
 We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.
 
@@ -29,6 +29,9 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
 
 ### 📢 Updates
 
+#### [Apr 9, 2025]
+- **New Benchmark!** Added support for the [WMDP](https://arxiv.org/abs/2403.03218) (Weapons of Mass Destruction Proxy) benchmark with cyber-forget-corpus and cyber-retain-corpus for unlearning hazardous knowledge.
+
 #### [Apr 6, 2025]
 ⚠️⚠️ **IMPORTANT:** Be sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.
 - **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and  Exact Memorization (EM) as additional evaluation metrics.
@@ -59,10 +62,10 @@ We provide several variants for each of the components in the unlearning pipelin
 
 | **Component**          | **Available Options** |
 |------------------------|----------------------|
-| **Benchmarks**        | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/) |
+| **Benchmarks**        | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://arxiv.org/abs/2403.03218) |
 | **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU |
 | **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks |
-| **Datasets**          | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits) |
+| **Datasets**          | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP (cyber corpus) |
 | **Model Families**    | TOFU: LLaMA-3.2, LLaMA-3.1, LLaMA-2; MUSE: LLaMA-2; Additional: Phi-3.5, Phi-1.5, Gemma |
 
 ---
@@ -155,6 +158,7 @@ The scripts below execute standard baseline unlearning experiments on the TOFU a
 ```bash
 bash scripts/tofu_unlearn.sh
 bash scripts/muse_unlearn.sh
+bash scripts/wmdp_unlearn.sh
 ```
 
 The above scripts are not tuned and uses default hyper parameter settings. We encourage you to tune your methods and add your final results in [`community/leaderboard.md`](community/leaderboard.md).
diff --git a/configs/data/datasets/WMDP_cyber_forget.yaml b/configs/data/datasets/WMDP_cyber_forget.yaml
new file mode 100644
index 0000000..64e5b0a
--- /dev/null
+++ b/configs/data/datasets/WMDP_cyber_forget.yaml
@@ -0,0 +1,11 @@
+WMDP_cyber_forget:
+  handler: QADataset
+  args:
+    hf_args:
+      path: "cais/wmdp"
+      name: "wmdp-cyber"
+      split: "test"
+    question_key: "question"
+    answer_key: "answer"
+    choices_key: "choices"
+    max_length: 512
diff --git a/configs/data/datasets/WMDP_cyber_forget_corpus.yaml b/configs/data/datasets/WMDP_cyber_forget_corpus.yaml
new file mode 100644
index 0000000..3bd464e
--- /dev/null
+++ b/configs/data/datasets/WMDP_cyber_forget_corpus.yaml
@@ -0,0 +1,9 @@
+WMDP_cyber_forget_corpus:
+  handler: PretrainingDataset
+  args:
+    hf_args:
+      path: "cais/wmdp-corpora"
+      name: "cyber-forget-corpus"
+      split: "train"
+    text_key: "text"
+    max_length: 2048
diff --git a/configs/data/datasets/WMDP_cyber_retain.yaml b/configs/data/datasets/WMDP_cyber_retain.yaml
new file mode 100644
index 0000000..a37442f
--- /dev/null
+++ b/configs/data/datasets/WMDP_cyber_retain.yaml
@@ -0,0 +1,11 @@
+WMDP_cyber_retain:
+  handler: QADataset
+  args:
+    hf_args:
+      path: "cais/wmdp"
+      name: "wmdp-cyber"
+      split: "test"
+    question_key: "question"
+    answer_key: "answer"
+    choices_key: "choices"
+    max_length: 512
diff --git a/configs/data/datasets/WMDP_cyber_retain_corpus.yaml b/configs/data/datasets/WMDP_cyber_retain_corpus.yaml
new file mode 100644
index 0000000..0e4ab21
--- /dev/null
+++ b/configs/data/datasets/WMDP_cyber_retain_corpus.yaml
@@ -0,0 +1,9 @@
+WMDP_cyber_retain_corpus:
+  handler: PretrainingDataset
+  args:
+    hf_args:
+      path: "cais/wmdp-corpora"
+      name: "cyber-retain-corpus"
+      split: "train"
+    text_key: "text"
+    max_length: 2048
diff --git a/configs/eval/wmdp.yaml b/configs/eval/wmdp.yaml
new file mode 100644
index 0000000..7efc21f
--- /dev/null
+++ b/configs/eval/wmdp.yaml
@@ -0,0 +1,20 @@
+# @package eval.wmdp
+# NOTE: the above line is not a comment, but sets the package for config. See https://hydra.cc/docs/upgrades/0.11_to_1.0/adding_a_package_directive/
+
+defaults: # include all defined metrics files
+  - tofu_metrics: # Reusing TOFU metrics for WMDP evaluation
+    - forget_quality
+    - forget_Q_A_Prob
+    - forget_Q_A_ROUGE
+    - model_utility
+    - privleak
+    - extraction_strength
+
+handler: WMDPEvaluator
+output_dir: ${paths.output_dir} # set to default eval directory
+metrics: {} # lists a mapping from each evaluation metric to its config 
+# populated through the first (@package) line in each metric config
+overwrite: false
+forget_split: test
+holdout_split: test
+retain_logs_path: null
diff --git a/configs/experiment/unlearn/wmdp/default.yaml b/configs/experiment/unlearn/wmdp/default.yaml
new file mode 100644
index 0000000..cf1916a
--- /dev/null
+++ b/configs/experiment/unlearn/wmdp/default.yaml
@@ -0,0 +1,49 @@
+# WMDP Unlearning Configuration
+
+defaults:
+  - override /model: Llama-3.2-1B-Instruct
+  - override /trainer: RMU
+  - override /data: unlearn
+  - override /data/datasets@data.forget: WMDP_cyber_forget_corpus
+  - override /data/datasets@data.retain: WMDP_cyber_retain_corpus
+  - override /eval: wmdp
+
+model:
+  model_args:
+    pretrained_model_name_or_path: meta-llama/Llama-3.2-1B-Instruct
+
+forget_split: train
+retain_split: train
+holdout_split: test
+retain_logs_path: null
+
+eval:
+  wmdp:
+    forget_split: ${forget_split}
+    holdout_split: ${holdout_split}
+    retain_logs_path: ${retain_logs_path}
+    overwrite: true
+    
+data:
+  anchor: forget
+  forget:
+    WMDP_cyber_forget_corpus: 
+      args:
+        hf_args:
+          name: "cyber-forget-corpus"
+          split: ${forget_split}
+  retain:
+    WMDP_cyber_retain_corpus:
+      args:
+        hf_args:
+          name: "cyber-retain-corpus"
+          split: ${retain_split}
+
+trainer:
+  args:
+    warmup_epochs: 1.0
+    learning_rate: 1e-5
+    weight_decay: 0.01
+    num_train_epochs: 10
+
+task_name: wmdp_cyber_unlearning
diff --git a/docs/evaluation.md b/docs/evaluation.md
index 99af829..a9f8201 100644
--- a/docs/evaluation.md
+++ b/docs/evaluation.md
@@ -36,6 +36,17 @@ python src/eval.py --config-name=eval.yaml \
 - `---config-name=eval.yaml`- this is set by default so can be omitted
 - `data_split=Books`- overrides the default MUSE data split (News). See [`configs/experiment/eval/muse/default.yaml`](../configs/experiment/eval/muse/default.yaml)
 
+Run the WMDP benchmark evaluation on a checkpoint of a LLaMA 3.2 model:
+```bash
+python src/eval.py --config-name=eval.yaml \
+  experiment=eval/wmdp/default \
+  model=Llama-3.2-1B-Instruct \
+  model.model_args.pretrained_model_name_or_path=<LOCAL_MODEL_PATH> \
+  task_name=WMDP_EVAL
+```
+- `experiment=eval/wmdp/default`- set experiment to use [`configs/eval/wmdp/default.yaml`](../configs/eval/wmdp/default.yaml)
+- The WMDP evaluation uses the cyber corpus by default and evaluates the model's performance on hazardous knowledge unlearning
+
 ## Metrics
 
 A metric takes a model and a dataset and computes statistics of the model over the datapoints (or) takes other metrics and computes an aggregated score over the dataset.
@@ -240,3 +251,23 @@ metrics: {} # lists a mapping from each evaluation metric listed above to its co
 output_dir: ${paths.output_dir} # set to default eval directory
 forget_split: forget10
 ```
+
+Example: WMDP evaluator config file ([`configs/eval/wmdp.yaml`](../configs/eval/wmdp.yaml))
+
+```yaml
+# @package eval.wmdp
+defaults: # include all the metrics that come under the WMDP evaluator
+  - tofu_metrics: # WMDP reuses many of the same metrics as TOFU
+    - forget_quality
+    - forget_Q_A_Prob
+    - forget_Q_A_ROUGE
+    - model_utility
+    - privleak
+    - extraction_strength
+
+handler: WMDPEvaluator
+metrics: {} # lists a mapping from each evaluation metric listed above to its config 
+output_dir: ${paths.output_dir} # set to default eval directory
+forget_split: test
+holdout_split: test
+```
diff --git a/docs/links.md b/docs/links.md
index 9a651a1..0b5c62e 100644
--- a/docs/links.md
+++ b/docs/links.md
@@ -32,6 +32,7 @@ Links to research papers and resources corresponding to implemented features in
 |-----------|----------|
 | TOFU      | Paper [📄](https://arxiv.org/abs/2401.06121) |
 | MUSE      | Paper [📄](https://arxiv.org/abs/2407.06460) |
+| WMDP      | Paper [📄](https://arxiv.org/abs/2403.03218), Code [🐙](https://github.com/centerforaisafety/wmdp), Website [🌐](https://www.wmdp.ai/) |
 
 ---
 
@@ -57,6 +58,7 @@ Links to research papers and resources corresponding to implemented features in
 ### 🐙 Other GitHub Repositories
 - [TOFU Benchmark (original)](https://github.com/locuslab/tofu)
 - [MUSE Benchmark (original)](https://github.com/swj0419/muse_bench)
+- [WMDP Benchmark (original)](https://github.com/centerforaisafety/wmdp)
 - [Awesome LLM Unlearning](https://github.com/chrisliu298/awesome-llm-unlearning)
 - [Awesome Machine Unlearning](https://github.com/tamlhp/awesome-machine-unlearning)
-- [Awesome GenAI Unlearning](https://github.com/franciscoliu/Awesome-GenAI-Unlearning)
\ No newline at end of file
+- [Awesome GenAI Unlearning](https://github.com/franciscoliu/Awesome-GenAI-Unlearning)
diff --git a/src/evals/__init__.py b/src/evals/__init__.py
index f5b323c..4f55711 100644
--- a/src/evals/__init__.py
+++ b/src/evals/__init__.py
@@ -2,6 +2,7 @@
 from omegaconf import DictConfig
 from evals.tofu import TOFUEvaluator
 from evals.muse import MUSEEvaluator
+from evals.wmdp import WMDPEvaluator
 
 EVALUATOR_REGISTRY: Dict[str, Any] = {}
 
@@ -31,3 +32,4 @@ def get_evaluators(eval_cfgs: DictConfig, **kwargs):
 # Register Your benchmark evaluators
 _register_evaluator(TOFUEvaluator)
 _register_evaluator(MUSEEvaluator)
+_register_evaluator(WMDPEvaluator)
diff --git a/src/evals/wmdp.py b/src/evals/wmdp.py
new file mode 100644
index 0000000..2d65d6d
--- /dev/null
+++ b/src/evals/wmdp.py
@@ -0,0 +1,6 @@
+from evals.base import Evaluator
+
+
+class WMDPEvaluator(Evaluator):
+    def __init__(self, eval_cfg, **kwargs):
+        super().__init__("WMDP", eval_cfg, **kwargs)