AAAI-25: On Effects of Steering Latent Representation for Large Language Model Unlearning

This repository contains code and resources for the paper "On Effects of Steering Latent Representation for Large Language Model Unlearning" by Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, and Naoya Inoue.

1. Setup

Follow the steps below to set up the environment:

conda create -n unlearning
conda activate unlearning
pip install -r requirements.txt

2. Dataset Preparation

Create a directory to store datasets:

mkdir data/

Download the required datasets from the WMDP repository and place them in the data/ directory.

3. Unlearning

Run the unlearning process with the following command:

python -m baselines.adap_rmu.unlearn \
    --model_name_or_path HuggingFaceH4/zephyr-7b-beta \
    --max_num_batches 500 \
    --alpha 1200,1200 \
    --batch_size 4 \
    --seed 42 \
    --scale 5.0 \
    --layer_id 7 \
    --layer_ids 5,6,7 \
    --verbose

Alternatively, perform a grid search for hyperparameter tuning by running:

bash experiments/adap_rmu.sh

4. Evaluation

We use the lm-eval-harness framework for evaluation.

!lm-eval --model hf \
    --model_args pretrained="checkpoints/rmu/adaptive_HuggingFaceH4/zephyr-7b-beta_alpha-1200-1200_coeffs-6.5-6.5_batches-500_layer-7_scale-5" \
    --tasks mmlu,wmdp \
    --batch_size=16

5. Reference

If you find this work helpful, please cite our paper:

@article{huu2024effects,
  title={On Effects of Steering Latent Representations for Large Language Model Unlearning},
  author={Dang, Huu-Tien and Pham, Trung-Tin and Hoang, Thanh-Tung and Inoue, Naoya},
  journal={arXiv preprint arXiv:2408.06223},
  year={2024}
}

Acknowledgment

This repository builds upon and extends the code from the WMDP repository.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
baselines/adap_rmu		baselines/adap_rmu
experiments		experiments
README.md		README.md
evaluate.ipynb		evaluate.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AAAI-25: On Effects of Steering Latent Representation for Large Language Model Unlearning

Table of Contents

1. Setup

2. Dataset Preparation

3. Unlearning

4. Evaluation

5. Reference

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

RebelsNLU-jaist/llm-unlearning

Folders and files

Latest commit

History

Repository files navigation

AAAI-25: On Effects of Steering Latent Representation for Large Language Model Unlearning

Table of Contents

1. Setup

2. Dataset Preparation

3. Unlearning

4. Evaluation

5. Reference

Acknowledgment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages