This is the official repo for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models, published at the Conference on Language Modeling (COLM) 2025.
MODELS: The models can be found in the Hugging Face collection. The Zephyr recipe and codebase were used to train these models and can be found here.
EVALUATION: The evaluation code used in the paper is from the CoCoNoT evaluation, described in this paper. The Temporal Setting materials can be found in their respective folders.
Temporal Setting: Instructions for generating training data and evaluating the model can be found here.
CoCoNoT Evaluation: Details for the CoCoNoT evaluation can be found here.
Please send any inquiries to njain17
at umd dot edu.