WMDP & lm-eval-harness #104

Dornavineeth · 2025-05-12T19:31:19Z

What does this PR do?

Add WMDP benchmark
Support LM Eval Harness evaluation suite
Add gibberish rate scores on forget data for model utility.

Acknowledgements

We thank @ruidazeng for sharing insights on the WMDP benchmark and initiating its dataset integration #93.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Have you gone through the contributions guide?
Are your changes documented? Read documentation guidelines here.

* testing commit * Fixes * cleanup

Fix tofu_unlearn.sh for IdKDPO method.

Revert "Dpo fix"

IdkDPO fix

* IdkDPO script fix in tofu_unlearn.sh (locuslab#65) * Fix hyperlinks in README * Download I don't know data in setup_data.py * Fix tofu_unlearn.sh for IdkDPO --------- Co-authored-by: Anmol Mekala <49127549+molereddy@users.noreply.github.com> * overwrite=True * RMU added * Fix ref model device * ruff fix * RMU updated * Update rmu.py * Update README.md: add RMU * Added references and renamed functions --------- Co-authored-by: Anmol Mekala <49127549+molereddy@users.noreply.github.com>

…on (#8) * docs: updates, small corrections, re-formats * modified ruff commands * modified ruff commands * CI/CD minor updates * added contributing + leaderboard * fix minor spelling misatkes * docs: bunch of minor updates * docs fixes --------- Co-authored-by: molereddy <m.anmolreddy@gmail.com>

* Re-formatting + more badges * Update and fix docs * Make error msg accurate * handle lack of flash-attn flag better * Document more hydra features * update example exp configs to match latest supported metrics * Change HF logo * Simplify eval exp cfg dump * testing push workflows * Add workflow test branch * update workflow path again * Reformat badges to fix blue line issue * Fix div * revert change to tests build path

* documentation fix * remove eos only after removing pad tokens + not use model train inside evaluation * Fix date to handle Llama3.1 repro issues due to tokenizer automatically adding curr date * ruff fixes * minor mistake * warn about and handle weird tokenization cases for small targets * Ruff fixes * The assert must hold by definition * Updating leaderboard.md numbers * Allow for invalid evaluations which are excluded from averaging * bug fix * ruff fixes --------- Co-authored-by: Dornavineeth <vineethdorna@gmail.com>

* Added WMDP and LM Eval support * added gibbersih metric * gibberish fix * fix gibberish * lm_eval summary clean * ruf fix * lm-eval fixes * fix config * update docs * Update docs * update setup_data.py * update readme * ruff fix --------- Co-authored-by: molereddy <m.anmolreddy@gmail.com>

molereddy and others added 20 commits March 1, 2025 09:13

Fix hyperlinks in README (#2)

54d3560

* testing commit * Fixes * cleanup

Fixed DPO command

4c36e4f

download idk

f7a69de

Merge pull request #3 from Dornavineeth/dpo_fix

1bd4411

Fix tofu_unlearn.sh for IdKDPO method.

Revert "Dpo fix"

332af36

Merge pull request #4 from Dornavineeth/revert-3-dpo_fix

7d6aef3

Revert "Dpo fix"

download idk data

f468efb

fix dpo experiment config

ca8d503

Merge pull request #5 from Dornavineeth/dpo_fix

dde60d3

IdkDPO fix

Merge branch 'locuslab:main' into main

6367fb6

Merge branch 'locuslab:main' into main

855c5f3

Merge branch 'locuslab:main' into main

4fb577c

Merge branch 'locuslab:main' into main

e3c4709

Merge branch 'main' of https://github.com/Dornavineeth/open-unlearning

c2df505

Merge branch 'main' of https://github.com/Dornavineeth/open-unlearning

3bad69d

Dornavineeth temporarily deployed to tests May 12, 2025 19:31 — with GitHub Actions Inactive

Dornavineeth merged commit a730f58 into locuslab:main May 12, 2025
1 check passed

Dornavineeth mentioned this pull request May 12, 2025

feat: add WMDP dataset integration #93

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WMDP & lm-eval-harness #104

WMDP & lm-eval-harness #104

Uh oh!

Dornavineeth commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

WMDP & lm-eval-harness #104

WMDP & lm-eval-harness #104

Uh oh!

Conversation

Dornavineeth commented May 12, 2025

What does this PR do?

Acknowledgements

Before submitting

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants