This repo contains the code for Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem. One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
- Unleashes Reasoning with One Example: One-Shot CFT uses critiques of diverse model-generated solutions to a single problem to significantly boost performance across math and logic tasks. For example, with just 5 GPU hours of training on Qwen2.5-Math-7B, One-Shot CFT achieves an average improvement of +15% on six math benchmarks and +16% on three logic reasoning benchmarks.
- Outperforms RLVR and Full SFT with 20× Less Compute: One-Shot CFT outperforms both one-shot Reinforcement Learning with Verifiable Rewards (RLVR) and full-dataset supervised fine-tuning, while requiring only 5 GPU hours on a 7B model—offering a much more efficient and stable training alternative.
- Robust Across Seeds and Model Scales: One-Shot CFT remains effective across different seed problem choices and model sizes—from 1.5B to 14B parameters—demonstrating strong generalization and scalability.
cd tools/
bash setup_env.sh
bash prepare_data.sh
- Train on Mathematical Reasoning
cd ../train/
bash train_on_math_reasoning.sh
We randomly select 500 math problems (excluding MATH-500) for validation. To validate after training:
cd train/Validation
bash start_validate.sh
This generates validation_summary.txt containing MATH-Validation scores per checkpoint. Select the checkpoint with the highest score as your final model.
- Train on Logic Reasoning
cd ../train/
bash train_on_logic_reasoning.sh
We do not use a separate validation set for logic tasks. Based on our experiments, checkpoints between ckpt-30 and ckpt-40 generally yield the best performance.
Edit the following scripts with your trained model path and output directory:
- eval/eval_on_math_reasoning.sh
- eval/eval_on_logic_reasoning.sh
Then run:
cd eval/
bash eval_on_math_reasoning.sh
bash eval_on_logic_reasoning.sh
Our evaluation code is based on Qwen2.5-Math and BBEH.
You can create new critique data using the prompt templates in "prompts/" for:
- Candidate solution generation
- Teacher critique generation
Cite our paper as
@article{wang2025unleashing,
title={Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem},
author={Wang, Yubo and Nie, Ping and Zou, Kai and Wu, Lijun and Chen, Wenhu},
journal={arXiv preprint arXiv:2506.03295},
year={2025}
}