Skip to content

This is the official repo for "Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models"

Notifications You must be signed in to change notification settings

neelsjain/refusal-tokens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Codebase for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

This is the official repo for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models, published at the Conference on Language Modeling (COLM) 2025.

MODELS: The models can be found in the Hugging Face collection. The Zephyr recipe and codebase were used to train these models and can be found here.

EVALUATION: The evaluation code used in the paper is from the CoCoNoT evaluation, described in this paper. The Temporal Setting materials can be found in their respective folders.

Temporal Setting: Instructions for generating training data and evaluating the model can be found here.

CoCoNoT Evaluation: Details for the CoCoNoT evaluation can be found here.

Please send any inquiries to njain17 at umd dot edu.

About

This is the official repo for "Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages