This repository contains the implementation and experimental results for the paper "Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization".
This work has been published at the ICML 2025 Workshop on Reliable and Responsible Foundation Models.
- ArXiv: https://arxiv.org/abs/2505.24859
- OpenReview: https://openreview.net/forum?id=sbm53EmmGp
- PDF: icml_2025_r2fm_workshop.pdf (included in this repository)
The repository is organized as follows:
.
├── config/ # Configuration files for experiments
├── data/ # Results from the experiments
├── datasets/ # Datasets used in the experiments
│ ├── sentiment/ # Sentiment vectors synthetic training data
│ ├── readability/ # Readability vectors synthetic training data
│ ├── toxicity/ # Toxicity vectors synthetic training data
│ └── topic/ # Topic vectors training representations
├── notebooks/ # Jupyter notebooks for exploratory analysis and visualization
├── scripts/ # Supporting scripts for data preparation and preprocessing
├── src/ # Core source code
│ ├── __init__.py
│ ├── experiments/ # Main steering experiments
│ ├── utils/
│ ├── plot_results.py # plot results for token reweighting
│ ├── run_experiments.py # run token reweighting experiments
│ ├── score_results.py # score token reweighting experiments
├── tests/ # Tests for some of the steering functionalities
├── pyproject.toml # Poetry configuration and dependencies
├── poetry.lock # Locked dependencies
├── LICENSE # License for the repository
├── README.md # Project documentation (this file)
This project uses Python 3.12 and Poetry for dependency management. Follow these steps to set up the environment:
-
Clone the repository:
git clone [repository-url] cd [repository-name]
-
Install dependencies using Poetry:
poetry install
-
Activate the virtual environment:
poetry shell
The project includes development dependencies for testing (pytest
), type checking (mypy
),
code formatting (black
), and linting (ruff
), as specified in pyproject.toml
.
The synthetic datasets used for training the steering vectors can be found in the following locations:
- Sentiment data:
datasets/sentiment/
- Readability data:
datasets/readability/
- Toxicity data:
datasets/toxicity/
- Topic representations:
datasets/topic/
This project is licensed under the MIT License. See the LICENSE file for details.