Diffing Toolkit: Model Comparison and Analysis Framework

[Work-In-Progress] A research framework for analyzing differences between language models using interpretability techniques. This project enables systematic comparison of base models and their variants (model organisms) through various diffing methodologies.

Note: The toolkit is based on a heavily modified version of the saprmarks/dictionary_learning repository, available at science-of-finetuning/dictionary_learning. Although we may eventually merge these repositories, this is currently not a priority due to significant divergence.

Overview

This framework consists of two main pipelines:

Preprocessing Pipeline: Extract and cache activations from pre-existing models
Diffing Pipeline: Analyze differences between models using interpretability techniques

The framework is designed to work with pre-existing model pairs (e.g., base models vs. model organisms) rather than training new models.

Installation

Clone the repository:

git clone https://github.com/science-of-finetuning/diffing-game
cd diffing-game

Install dependencies:

pip install -r requirements.txt

Quick Start

Basic Usage

Run the complete pipeline (preprocessing + diffing) with default settings:

python main.py

Pipeline Modes

Run preprocessing only (extract activations):

python main.py pipeline.mode=preprocessing

Run diffing analysis only (assumes activations already exist):

python main.py pipeline.mode=diffing

Configuration Examples

Analyze specific organism and model combinations:

python main.py organism=caps model=gemma3_1B

Use different diffing methods:

python main.py diffing/method=kl
python main.py diffing/method=normdiff

Multi-run Experiments

Run experiments across multiple configurations:

python main.py --multirun organism=caps,roman_concrete model=gemma3_1B

Run with different diffing methods:

python main.py --multirun diffing/method=kl,normdiff

Interactive Dashboard

The framework includes a Streamlit-based interactive dashboard for visualizing and exploring model diffing results.

Features

Dynamic Discovery: Automatically detects available models, organisms, and diffing methods
Real-time Visualization: Interactive plots and visualizations of diffing results
Model Integration: Direct links to Hugging Face model pages
Multi-method Support: Compare results across different diffing methodologies
Interactive Model Testing: Test custom inputs and steering vectors on both base and finetuned models in real-time

Running the Dashboard

Launch the dashboard with:

streamlit run dashboard.py

The dashboard will be available at http://localhost:8501 by default.

You can also pass configuration overwrites to the dashboard:

streamlit run dashboard.py -- model.dtype=float32

Using the Dashboard

Select Base Model: Choose from available base models
Select Organism: Pick the model organism (finetuned variant)
Select Diffing Method: Choose the analysis method to visualize
Explore Results: Interact with the generated visualizations

The dashboard requires that you have already run diffing experiments to generate results to visualize.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.streamlit		.streamlit
configs		configs
resources		resources
run		run
scripts		scripts
src		src
test		test
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
dashboard.py		dashboard.py
dashboard_preview.png		dashboard_preview.png
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diffing Toolkit: Model Comparison and Analysis Framework

Overview

Installation

Quick Start

Basic Usage

Pipeline Modes

Configuration Examples

Multi-run Experiments

Interactive Dashboard

Features

Running the Dashboard

Using the Dashboard

About

Uh oh!

Releases

Packages

Languages

License

science-of-finetuning/diffing-toolkit

Folders and files

Latest commit

History

Repository files navigation

Diffing Toolkit: Model Comparison and Analysis Framework

Overview

Installation

Quick Start

Basic Usage

Pipeline Modes

Configuration Examples

Multi-run Experiments

Interactive Dashboard

Features

Running the Dashboard

Using the Dashboard

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages