Skip to content

Conversation

@Dornavineeth
Copy link
Collaborator

What does this PR do?

  • Add WMDP benchmark
  • Support LM Eval Harness evaluation suite
  • Add gibberish rate scores on forget data for model utility.

Acknowledgements

We thank @ruidazeng for sharing insights on the WMDP benchmark and initiating its dataset integration #93.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Have you gone through the contributions guide?
  • Are your changes documented? Read documentation guidelines here.

molereddy and others added 20 commits March 1, 2025 09:13
* testing commit

* Fixes

* cleanup
Fix tofu_unlearn.sh for IdKDPO method.
* IdkDPO script fix in tofu_unlearn.sh (locuslab#65)

* Fix hyperlinks in README
* Download I don't know data in setup_data.py
* Fix tofu_unlearn.sh for IdkDPO

---------

Co-authored-by: Anmol Mekala <49127549+molereddy@users.noreply.github.com>

* overwrite=True

* RMU added

* Fix ref model device

* ruff fix

* RMU updated

* Update rmu.py

* Update README.md: add RMU

* Added references and renamed functions

---------

Co-authored-by: Anmol Mekala <49127549+molereddy@users.noreply.github.com>
…on (#8)

* docs: updates, small corrections, re-formats

* modified ruff commands

* modified ruff commands

* CI/CD minor updates

* added contributing + leaderboard

* fix minor spelling misatkes

* docs: bunch of minor updates

* docs fixes

---------

Co-authored-by: molereddy <m.anmolreddy@gmail.com>
* Re-formatting + more badges

* Update and fix docs

* Make error msg accurate

* handle lack of flash-attn flag better

* Document more hydra features

* update example exp configs to match latest supported metrics

* Change HF logo

* Simplify eval exp cfg dump

* testing push workflows

* Add workflow test branch

* update workflow path again

* Reformat badges to fix blue line issue

* Fix div

* revert change to tests build path
* documentation fix

* remove eos only after removing pad tokens + not use model train inside evaluation

* Fix date to handle Llama3.1 repro issues due to tokenizer automatically adding curr date

* ruff fixes

* minor mistake

* warn about and handle weird tokenization cases for small targets

* Ruff fixes

* The assert must hold by definition

* Updating leaderboard.md numbers

* Allow for invalid evaluations which are excluded from averaging

* bug fix

* ruff fixes

---------

Co-authored-by: Dornavineeth <vineethdorna@gmail.com>
* Added WMDP and LM Eval support

* added gibbersih metric

* gibberish fix

* fix gibberish

* lm_eval summary clean

* ruf fix

* lm-eval fixes

* fix config

* update docs

* Update docs

* update setup_data.py

* update readme

* ruff fix

---------

Co-authored-by: molereddy <m.anmolreddy@gmail.com>
@Dornavineeth Dornavineeth merged commit a730f58 into locuslab:main May 12, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants