GitHub - neelsjain/refusal-tokens: This is the official repo for "Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models"

Codebase for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

This is the official repo for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models, published at the Conference on Language Modeling (COLM) 2025.

MODELS: The models can be found in the Hugging Face collection. The Zephyr recipe and codebase were used to train these models and can be found here.

EVALUATION: The evaluation code used in the paper is from the CoCoNoT evaluation, described in this paper. The Temporal Setting materials can be found in their respective folders.

Temporal Setting: Instructions for generating training data and evaluating the model can be found here.

CoCoNoT Evaluation: Details for the CoCoNoT evaluation can be found here.

Please send any inquiries to njain17 at umd dot edu.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
coconot_eval		coconot_eval
temporal_setting		temporal_setting
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Codebase for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

About

Uh oh!

Releases

Packages

Languages

neelsjain/refusal-tokens

Folders and files

Latest commit

History

Repository files navigation

Codebase for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages