Skip to content

TIGER-AI-Lab/One-Shot-CFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

One-Shot-CFT

This repo contains the code for Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem. One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.


Highlights

  • Unleashes Reasoning with One Example: One-Shot CFT uses critiques of diverse model-generated solutions to a single problem to significantly boost performance across math and logic tasks. For example, with just 5 GPU hours of training on Qwen2.5-Math-7B, One-Shot CFT achieves an average improvement of +15% on six math benchmarks and +16% on three logic reasoning benchmarks.
  • Outperforms RLVR and Full SFT with 20× Less Compute: One-Shot CFT outperforms both one-shot Reinforcement Learning with Verifiable Rewards (RLVR) and full-dataset supervised fine-tuning, while requiring only 5 GPU hours on a 7B model—offering a much more efficient and stable training alternative.
  • Robust Across Seeds and Model Scales: One-Shot CFT remains effective across different seed problem choices and model sizes—from 1.5B to 14B parameters—demonstrating strong generalization and scalability.

Getting Started

Installation

cd tools/
bash setup_env.sh

Preparing Datasets

bash prepare_data.sh

Training

  1. Train on Mathematical Reasoning
cd ../train/
bash train_on_math_reasoning.sh

We randomly select 500 math problems (excluding MATH-500) for validation. To validate after training:

cd train/Validation
bash start_validate.sh

This generates validation_summary.txt containing MATH-Validation scores per checkpoint. Select the checkpoint with the highest score as your final model.

  1. Train on Logic Reasoning
cd ../train/
bash train_on_logic_reasoning.sh

We do not use a separate validation set for logic tasks. Based on our experiments, checkpoints between ckpt-30 and ckpt-40 generally yield the best performance.

Evaluation

Edit the following scripts with your trained model path and output directory:

  • eval/eval_on_math_reasoning.sh
  • eval/eval_on_logic_reasoning.sh

Then run:

cd eval/
bash eval_on_math_reasoning.sh
bash eval_on_logic_reasoning.sh

Our evaluation code is based on Qwen2.5-Math and BBEH.

Create Your Own Critique Data

You can create new critique data using the prompt templates in "prompts/" for:

  • Candidate solution generation
  • Teacher critique generation

Citation

Cite our paper as

@article{wang2025unleashing,
  title={Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem},
  author={Wang, Yubo and Nie, Ping and Zou, Kai and Wu, Lijun and Chen, Wenhu},
  journal={arXiv preprint arXiv:2506.03295},
  year={2025}
}

About

The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem”

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages