Skip to content

fscdc/Awesome-Efficient-Reasoning-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Reasoning Models: A Survey

An overview of research in efficient reasoning models

arXiv

This repository is for our paper:

Efficient Reasoning Models: A Survey
Sicheng Feng1,2, Gongfan Fang1, Xinyin Ma1, Xinchao Wang1,*
1National University of Singapore, Singapore
2Nankai University, Tianjin, China
Corresponding author: xinchao@nus.edu.sg


🙋 Please let us know if you find out a mistake or have any suggestions!

🌟 If you find this resource helpful, please consider to star this repository and cite our research!

Updates

Full list

Contributions

If you want to add your paper or update details like conference info or code URLs, please submit a pull request. You can generate the necessary markdown for each paper by filling out generate_item.py and running python generate_item.py. We greatly appreciate your contributions. Alternatively, you can email me (Gmail) the links to your paper and code, and I will add your paper to the list as soon as possible.


Quick Links

Make Long CoT Short

SFT-based Methods

Title & Authors Introduction Links
Star
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang
image Github
Paper
Star
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
Wanlong Liu, Junxiao Xu, Fei Yu, Yukang Lin, Ke Ji, Wenyu Chen, Yan Xu, Yasheng Wang, Lifeng Shang, Benyou Wang
image Github
Paper
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Shengjia Zhang, Junjie Wu, Jiawei Chen, Changwang Zhang, Xingyu Lou, Wangchunshu Zhou, Sheng Zhou, Can Wang, Jun Wang
image Paper
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
Yifan Wu, Jingze Shi, Bingheng Wu, Jiayi Zhang, Xiaotian Lin, Nan Tang, Yuyu Luo
image Paper
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition
Zihao Zeng, Xuyao Huang, Boxiu Li, Hao Zhang, Zhijie Deng
image Paper
Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
Yao Xu, Mingyu Xu, Fangyu Lei, Wangtao Sun, Xiangrong Zeng, Bingning Wang, Guang Liu, Shizhu He, Jun Zhao, Kang Liu
image Paper
Star
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
Yuxuan Jiang, Dawei Li, Frank Ferraro
image Github
Paper
Star
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang
image Github
Paper
Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning
Shangziqi Zhao, Jiahao Yuan, Guisong Yang, Usman Naseem
image Paper
Star
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang
image Github
Paper
Star
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, Dacheng Tao
image Github
Paper
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Hunyuan team
image Paper
Star
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang
image Github
Paper
Star
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
Bin Yu, Hang Yuan, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen
image Github
Paper
Llama-Nemotron: Efficient Reasoning Models
NVIDIA
image Paper
Publish
C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou
image Paper
Star Publish
Can Language Models Learn to Skip Steps?
Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang
image Github
Paper
Distilling System 2 into System 1
Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov
image Paper
Star
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li
image Github
Paper
Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
Yingqian Cui, Pengfei He, Jingying Zeng, Hui Liu, Xianfeng Tang, Zhenwei Dai, Yan Han, Chen Luo, Jing Huang, Zhen Li, Suhang Wang, Yue Xing, Jiliang Tang, Qi He
image Paper
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei
image Paper
Star
Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun
image Github
Paper
Star
Token-Budget-Aware LLM Reasoning
Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen
image Github
Paper

RL-based Methods

Title & Authors Introduction Links
Star
Thinkless: LLM Learns When to Think
Gongfan Fang, Xinyin Ma, Xinchao Wang
image Github
Paper
Compressing Chain-of-Thought in LLMs via Step Entropy
Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu
image Paper
Star Publish
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning
Hiroshi Yoshihara, Taiki Yamaguchi, Yuichi Inoue
image Github
Paper
Star
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
Razvan-Gabriel Dumitru, Darius Peteleaza, Vikas Yadav, Liangming Pan
image Github
Paper
Star
Optimizing Length Compression in Large Reasoning Models
Zhengxiang Cheng, Dongping Chen, Mingyang Fu, Tianyi Zhou
image Github
Paper
PATS: Process-Level Adaptive Thinking Mode Switching
Yi Wang, Junxiao Liu, Shimao Zhang, Jiajun Chen, Shujian Huang
image Paper
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition
Zihao Zeng, Xuyao Huang, Boxiu Li, Hao Zhang, Zhijie Deng
image Paper
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting
Shijue Huang, Hongru Wang, Wanjun Zhong, Zhaochen Su, Jiazhan Feng, Bowen Cao, Yi R. Fung
image Paper
Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
Hanbing Liu, Lang Cao, Yuanyi Ren, Mengyu Zhou, Haoyu Dong, Xiaojun Ma, Shi Han, Dongmei Zhang
image Paper
ARM: Adaptive Reasoning Model
Siye Wu, Jian Xie, Yikai Zhang, Aili Chen, Kai Zhang, Yu Su, Yanghua Xiao
image Paper
When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Haodong Zhao, Hao Li, Jiansong Chen, Ke Zeng, Xunliang Cai
image Paper
Star
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He
image Github
Paper
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Hunyuan team
image Paper
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, Furu Wei
image Paper
Reward Reasoning Model
Jiaxin Guo, Zewen Chi, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, Furu Wei
image Paper
Star
AdaptThink: Reasoning Models Can Learn When to Think
Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li
image Github
Paper
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
Yansong Ning, Wei Li, Jun Fang, Naiqiang Tan, Hao Liu
image Paper
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu, Xueyi Chen, Rui Ming, Jilong Gao, Shoubo Hu, Zhuolun He, Bei Yu
image Paper
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
Chenwei Lou, Zewei Sun, Xinnian Liang, Meng Qu, Wei Shen, Wenqi Wang, Yuntao Li, Qingping Yang, Shuangzhi Wu
image Paper
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
Jingyao Wang, Wenwen Qiang, Zeen Song, Changwen Zheng, Hui Xiong
image Paper
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
Aybora Koksal, A. Aydin Alatan
image Paper
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong
image Paper
Star
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang
image Github
Paper
Llama-Nemotron: Efficient Reasoning Models
NVIDIA
image Paper
Star
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen
image Github
Paper
Star
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao
image Github
Paper
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
image Paper
Star
Demystifying Long Chain-of-Thought Reasoning in LLMs
Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue
image Github
Paper
Star
Training Language Models to Reason Efficiently
Daman Arora, Andrea Zanette
image Github
Paper
Star
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Pranjal Aggarwal, Sean Welleck
image Github
Paper
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Shiguo Lian
image Paper
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li, Nazhou Liu, Kai Yang
image Paper
Star
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang
image Github
Paper
Think When You Need: Self-Adaptive Chain-of-Thought Learning
Junjie Yang, Ke Lin, Xing Yu
image Paper

Prompt-driven Methods

Prompt-guided Efficint Reasoning
Title & Authors Introduction Links
Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt
Keqin Peng, Liang Ding, Yuanxin Ouyang, Meng Fang, Dacheng Tao
image Paper
Star
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang
image Github
Paper
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
Junyu Ma, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu
image Paper
Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint
Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu
image Paper
CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models
Feiyang Li, Peng Fang, Zhan Shi, Arijit Khan, Fang Wang, Dan Feng, Weihao Wang, Xin Zhang, Yongjian Cui
image Paper
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He
image Paper
Star
Token-Budget-Aware LLM Reasoning
Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen
image Github
Paper
Star Publish
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
Matthew Renze, Erhan Guven
image Github
Paper
Break the Chain: Large Language Models Can be Shortcut Reasoners
Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang
image Paper
Star
Chain of Draft: Thinking Faster by Writing Less
Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He
image Github
Paper
Star Publish
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
Qiguang Chen, Libo Qin, Jiaqi Wang, Jinxuan Zhou, Wanxiang Che
image Github
Paper
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee, Ethan Che, Tianyi Peng
Paper
Prompt Attribute-Aware Reasoning Routing
Title & Authors Introduction Links
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Jinghui Lu, Haiyang Yu, Siliang Xu, Shiwei Ran, Guozhi Tang, Siqi Wang, Bin Shan, Teng Fu, Hao Feng, Jingqun Tang, Han Wang, Can Huang
image Paper
Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers
Yang Li
image Paper
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee, Ethan Che, Tianyi Peng
Paper
Publish
RouteLLM: Learning to Route LLMs with Preference Data
Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica
image Paper
Star
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Simon A. Aytes, Jinheon Baek, Sung Ju Hwang
image Github
Paper
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, Xia Hu
image Paper
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization
Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu
image Paper
Blog

Latent Reasoning

Title & Authors Introduction Links
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
Chünhung Wu, Jinliang Lu, Zixuan Ren, Gangqiang Hu, Zhi Wu, Dai Dai, Hua Wu
image Paper
CTRLS: Chain-of-Thought Reasoning via Latent State-Transition
Junda Wu, Yuxin Xiong, Xintong Li, Zhengmian Hu, Tong Yu, Rui Wang, Xiang Chen, Jingbo Shang, Julian McAuley
image Paper
Star
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu
image Github
Paper
Star
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
Zeyuan Yang, Xueyang Yu, Delin Chen, Maohao Shen, Chuang Gan
image Github
Paper
Star
Parallel Continuous Chain-of-Thought with Jacobi Iteration
Haoyi Wu, Zhihao Teng, Kewei Tu
image Github
Paper
DART: Distilling Autoregressive Reasoning to Silent Thought
Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian
image Paper
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Xiaoqiang Wang, Suyuchen Wang, Yun Zhu, Bang Liu
image Paper
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue, Bowen Jin, Huimin Zeng, Honglei Zhuang, Zhen Qin, Jinsung Yoon, Lanyu Shang, Jiawei Han, Dong Wang
image Paper
SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought
Guanghao Li,Wenhao Jiang,Mingfeng Chen,Yan Li,Hao Yu,Shuting Dong,Tao Ren,Ming Tang,Chun Yuan
image Paper
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Halil Alperen Gozeten,M. Emrullah Ildiz,Xuechen Zhang,Hrayr Harutyunyan,Ankit Singh Rawat,Samet Oymak
image Paper
Star
Efficient Reasoning via Chain of Unconscious Thought
Ruihan Gong, Yue Liu, Wenjie Qu, Mingzhe Du, Yufei He, Yingwei Ma, Yulin Chen, Xiang Liu, Yi Wen, Xinfeng Li, Ruidong Wang, Xinzhong Zhu, Bryan Hooi, Jiaheng Zhang
image Github
Paper
Star
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen
image Github
Paper
Star
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan, Jiaze Li, Jianzhong Ju, Zhenbo Luo, Jian Luan, Ruihua Song
image Github
Paper
Star
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang
image Github
Paper
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Zihao Li, Xu Wang, Yuzhe Yang, Ziyu Yao, Haoyi Xiong, Mengnan Du
image Paper
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Hengli Li, Chenxi Li, Tong Wu, Xuekai Zhu, Yuxuan Wang, Zhaoxin Yu, Eric Hanchen Jiang, Song-Chun Zhu, Zixia Jia, Ying Nian Wu, Zilong Zheng
image Paper
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu, Shibo Hao, Zhiting Hu, Jiantao Jiao, Stuart Russell, Yuandong Tian
image Paper
Star
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao
image Github
Paper
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Thilo Hagendorff, Sarah Fabi
image Paper
Distilling System 2 into System 1
Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov
image Paper
Star
Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber
image Github
Paper
Star Publish
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong
image Github
Paper
Star
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Yuntian Deng, Yejin Choi, Stuart Shieber
image Github
Paper
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Jeffrey Cheng, Benjamin Van Durme
image Paper
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao
image Paper
Publish
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, Sashank J. Reddi
image Paper
Star
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He
image Github
Paper
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He
image Paper
Star
LightThinker: Thinking Step-by-Step Compression
Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang
image Github
Paper
Star Publish
Guiding Language Model Reasoning with Planning Tokens
Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni
image Github
Paper
Star Publish
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau, William Merrill, Samuel R. Bowman
image Github
Paper
Star
Disentangling Memory and Reasoning Ability in Large Language Models
Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang
image Github
Paper
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng
image Paper
Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian
image Paper
Star
Efficient Reasoning with Hidden Thinking
Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu
image Github
Paper
Publish
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
image Paper
Star
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein
image Github
Paper
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
Saif Punjwani, Larry Heck
image Paper

Build SLM with Strong Reasoning Ability

Distillation

Title & Authors Introduction Links
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu, Yuan Qi
image Paper
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
Xiao Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, Xinwang Liu
image Paper
Llama-Nemotron: Efficient Reasoning Models
NVIDIA
image Paper
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Haoran Xu, Baolin Peng, Hany Awadalla, Dongdong Chen, Yen-Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen
image Paper
Phi-4-reasoning Technical Report
Marah Abdin, Sahaj Agarwal, Ahmed Awadallah, Vidhisha Balachandran, Harkirat Behl, Lingjiao Chen, Gustavo de Rosa, Suriya Gunasekar, Mojan Javaheripi, Neel Joshi, Piero Kauffmann, Yash Lara, Caio César Teodoro Mendes, Arindam Mitra, Besmira Nushi, Dimitris Papailiopoulos, Olli Saarikivi, Shital Shah, Vaishnavi Shrivastava, Vibhav Vineet, Yue Wu, Safoora Yousefi, Guoqing Zheng
image Paper
Publish
Teaching Small Language Models to Reason
Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn
image Paper
Publish
Mixed Distillation Helps Smaller Language Model Better Reasoning
Chenglin Li, Qianglong Chen, Liangyue Li, Caiyu Wang, Yicheng Li, Zulong Chen, Yin Zhang
image Paper
Star
Small Models Struggle to Learn from Strong Reasoners
Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran
image Github
Paper
Star Publish
Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Bin Sun, Xinglin Wang, Heda Wang, Kan Li
image Github
Paper
Publish
Teaching Small Language Models Reasoning through Counterfactual Distillation
Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, Yin Zhang
image Paper
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, GengRu Chen, Wenbo Su, Bo Zheng
image Paper
Star Publish
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang
image Github
Paper
Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
Xunyu Zhu, Jian Li, Can Ma, Weiping Wang
image Paper
Star Publish
SKIntern : Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models
Huanxuan Liao, Shizhu He, Yupu Hao, Xiang Li, Yuanzhe Zhang, Jun Zhao, Kang Liu
image Github
Paper
Publish
Probe then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models
Yichun Zhao, Shuheng Zhou, Huijia Zhu
image Paper
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y. Li, Aviv Bick, J. Zico Kolter, Albert Gu, François Fleuret, Tri Dao
image Paper
Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
Xiaoshu Chen, Sihang Zhou, Ke Liang, Xinwang Liu
image Paper
Star
Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen
image Github
Paper

Quantization and Pruning

Title & Authors Introduction Links
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava, Shuxiang Cao, Xuan Wang
image Paper
Star
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou
image Github
Paper
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang, Yusen Zhang, Prasenjit Mitra, Rui Zhang
image Paper

RL+SLM Methods

Title & Authors Introduction Links
Replacing thinking with tool usage enables reasoning in small language models
Corrado Rainone, Tim Bakker, Roland Memisevic
image Paper
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu, Yuan Qi
image Paper
Llama-Nemotron: Efficient Reasoning Models
NVIDIA
image Paper
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Haoran Xu, Baolin Peng, Hany Awadalla, Dongdong Chen, Yen-Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen
image Paper
Phi-4-reasoning Technical Report
Marah Abdin, Sahaj Agarwal, Ahmed Awadallah, Vidhisha Balachandran, Harkirat Behl, Lingjiao Chen, Gustavo de Rosa, Suriya Gunasekar, Mojan Javaheripi, Neel Joshi, Piero Kauffmann, Yash Lara, Caio César Teodoro Mendes, Arindam Mitra, Besmira Nushi, Dimitris Papailiopoulos, Olli Saarikivi, Shital Shah, Vaishnavi Shrivastava, Vibhav Vineet, Yue Wu, Safoora Yousefi, Guoqing Zheng
image Paper
Star
Tina: Tiny Reasoning Models via LoRA
Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger
image Github
Paper
Star
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang, Chris Ngo
Github
Paper
Star
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He
image Github
Paper
Repo

Let Decoding More Efficient

Efficient TTS

Title & Authors Introduction Links
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot, Edoardo M. Ponti
image Paper
Control-R: Towards controllable test-time scaling
Di Zhang, Weida Wang, Junxian Li, Xunzhi Wang, Jiatong Li, Jianbo Wu, Jingdi Lei, Haonan He, Peng Ye, Shufei Zhang, Wanli Ouyang, Yuqiang Li, Dongzhan Zhou
image Paper
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou
image Paper
First Finish Search: Efficient Test-Time Scaling in Large Language Models
Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty
image Paper
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu
image Paper
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, Di Niu
image Paper
Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Parsa Mirtaheri, Ezra Edelman, Samy Jelassi, Eran Malach, Enric Boix-Adsera
image Paper
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz
image Paper
Star
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun
image Github
Paper
Accelerated Test-Time Scaling with Model-Free Speculative Sampling
Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati
image Paper
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
Eric Hanchen Jiang, Haozheng Luo, Shengyuan Pang, Xiaomin Li, Zhenting Qi, Hengli Li, Cheng-Fu Yang, Zongyu Lin, Xinfeng Li, Hao Xu, Kai-Wei Chang, Ying Nian Wu
image Paper
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Hao Mark Chen, Guanxi Lu, Yasuyuki Okoshi, Zhiwen Mo, Masato Motomura, Hongxiang Fan
image Paper
Reward Reasoning Model
Jiaxin Guo, Zewen Chi, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, Furu Wei
image Paper
Fractured Chain-of-Thought Reasoning
Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong
image Paper
Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately
Yuhang Wang, Youhe Jiang, Bin Cui, Fangcheng Fu
image Paper
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen, Morgane M Moss, Alessandro Sordoni, Rishabh Agarwal, Arian Hosseini
image Paper
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou
image Paper
Star
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li
image Github
Paper
Star
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam
image Github
Paper
Star Publish
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li
image Github
Paper
Star Publish
Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning
Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li
image Github
Paper
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou
image Paper
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning
Zhi Zhou, Tan Yuhao, Zenan Li, Yuan Yao, Lan-Zhe Guo, Xiaoxing Ma, Yu-Feng Li
image Paper
Confidence Improves Self-Consistency in LLMs
Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, Gal Yona
image Paper
Star
Efficient Test-Time Scaling via Self-Calibration
Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang
image Github
Paper
Star Publish
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette
image Github
Paper
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Yiming Wang, Pei Zhang, Siyuan Huang, Baosong Yang, Zhuosheng Zhang, Fei Huang, Rui Wang
image Paper
FastMCTS: A Simple Sampling Strategy for Data Synthesis
Peiji Li, Kai Lv, Yunfan Shao, Yichuan Ma, Linyang Li, Xiaoqing Zheng, Xipeng Qiu, Qipeng Guo
image Paper
Star Publish
Non-myopic Generation of Language Models for Reasoning and Planning
Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong
image Github
Paper
Star
Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes, Alan Ritter
image Github
Paper
Star
ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu
image Github
Paper
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, Xianglong Liu, Dacheng Tao
image Paper
Star
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu
image Github
Paper

Other Optimal Methods

Title & Authors Introduction Links
Star
Activation Steering for Chain-of-Thought Compression
Seyedarmin Azizi, Erfan Baghaei Potraghloo, Massoud Pedram
image Github
Paper
Scaling Speculative Decoding with Lookahead Reasoning
Yichao Fu, Rui Ge, Zelei Shao, Zhijie Deng, Hao Zhang
image Paper
Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency
Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou
image Paper
Steering LLM Thinking with Budget Guidance
Junyan Li, Wenshuo Zhao, Yang Zhang, Chuang Gan
image Paper
Star
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang
image Github
Paper
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Roy Eisenstadt, Itamar Zimerman, Lior Wolf
image Paper
Star
Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
Peijie Liu, Fengli Xu, Yong Li
image Github
Paper
Star
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang
image Github
Paper
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Zilin Xiao,Jaywon Koo,Siru Ouyang,Jefferson Hernandez,Yu Meng,Vicente Ordonez
image Paper
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
Xiaoang Xu,Shuo Wang,Xu Han,Zhenghao Liu,Huijia Wu,Peipei Li,Zhiyuan Liu,Maosong Sun,Zhaofeng He
image Paper
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
Zekai Zhao, Qi Liu, Kun Zhou, Zihan Liu, Yifei Shao, Zhiting Hu, Biwei Huang
image Paper
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
Mengru Wang, Xingyu Chen, Yue Wang, Zhiwei He, Jiahao Xu, Tian Liang, Qiuzhi Liu, Yunzhi Yao, Wenxuan Wang, Ruotian Ma, Haitao Mi, Ningyu Zhang, Zhaopeng Tu, Xiaolong Li, Dong Yu
image Paper
Star
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim
image Github
Paper
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
Qianyue Hao, Sibo Li, Jian Yuan, Yong Li
image Paper
Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity
Chan-Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu
image Paper
Star Publish
Rethinking Repetition Problems of LLMs in Code Generation
Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li
image Github
Paper
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
Ren Zhuang, Ben Wang, Shuifa Sun
image Paper
Star Publish
Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning
Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu
image Github
Paper
Star
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Hasan Abed Al Kader Hammoud, Hani Itani, Bernard Ghanem
image Github
Paper
Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models
Tyler McDonald, Ali Emami
image Paper
Star
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang, Juntao Li, Lijun Wu, Min Zhang
image Github
Paper
Dynamic Early Exit in Reasoning Models
Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Zheng Lin, Li Cao, Weiping Wang
image Paper
Star
Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
image Github
Paper
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu, Michael Saxon, Wenyue Hua, William Yang Wang
image Paper
Star Publish
Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang
image Github
Paper
Adaptive Skeleton Graph Decoding
Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo
image Paper
Star
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong
image Github
Paper
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Yuan Sui, Yufei He, Tri Cao, Simeng Han, Bryan Hooi
image Paper
Star
Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo
image Github
Paper
DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light, Wei Cheng, Wu Yue, Masafumi Oyamada, Mengdi Wang, Santiago Paternain, Haifeng Chen
image Paper
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu, Yan Zheng, Rong Cheng, Qiyu Wu, Wei Guo, Fei Ni, Hebin Liang, Yifu Yuan, Hangyu Mao, Fuzheng Zhang, Jianye Hao
image Paper
Star
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang
image Github
Paper
Publish
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar
image Paper
Star Publish
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang
image Github
Paper
Star
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar
image Github
Paper
Star
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali
image Github
Paper

Efficient Multimodal Reasoning

Title & Authors Introduction Links
Star
Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning
Shenshen Li, Kaiyuan Deng, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Heng Tao Shen, Xing Xu
image Github
Paper
Star
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang
image Github
Paper
Star
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Jiaqi Wang, Kevin Qinghong Lin, James Cheng, Mike Zheng Shou
image Github
Paper
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
Aybora Koksal, A. Aydin Alatan
image Paper
Star
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang
image Github
Paper
Star
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang
image Github
Paper
Background Papers (multimodal reasoning)
Title & Authors Introduction Links
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Luozheng Qin, Jia Gong, Yuqing Sun, Tianjiao Li, Mengping Yang, Xiaomeng Yang, Chao Qu, Zhiyu Tan, Hao Li
image Paper
Star
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
Yang Chen, Yufan Shen, Wenxuan Huang, Sheng Zhou, Qunshu Lin, Xinyu Cai, Zhi Yu, Jiajun Bu, Botian Shi, Yu Qiao
image Github
Paper
VGR: Visual Grounded Reasoning
Jiacong Wang, Zijian Kang, Haochen Wang, Haiyong Jiang, Jiawen Li, Bohong Wu, Ya Wang, Jiao Ran, Xiao Liang, Chao Feng, Jun Xiao
image Paper
Star
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
Junfei Wu, Jian Guan, Kaituo Feng, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan
image Github
Paper
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun, Jiantao Qiu, Yizhen Jiang, Guangxin He, Bohan Zeng, Conghui He, Binhang Yuan, Wentao Zhang
image Paper
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, Mingyu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen
image Paper
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma, Linge Du, Xuyang Shen, Shaoxiang Chen, Pengfei Li, Qibing Ren, Lizhuang Ma, Yuchao Dai, Pengfei Liu, Junjie Yan
image Paper
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
Xinyan Chen, Renrui Zhang, Dongzhi Jiang, Aojun Zhou, Shilin Yan, Weifeng Lin, Hongsheng Li
image Paper
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
Yufei Zhan, Ziheng Wu, Yousong Zhu, Rongkun Xue, Ruipu Luo, Zhenghao Chen, Can Zhang, Yifan Li, Zhentao He, Zheming Yang, Ming Tang, Minghui Qiu, Jinqiao Wang
image Paper
Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
Songtao Jiang, Chenyi Zhou, Yan Zhang, Yeying Jin, Zuozhu Liu
image Paper
Grounded Reinforcement Learning for Visual Reasoning
Gabriel Sarch,Snigdha Saha,Naitik Khandelwal,Ayush Jain,Michael J. Tarr,Aviral Kumar,Katerina Fragkiadaki
image Paper
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models
Zeyu Liu,Yuhang Liu,Guanghao Zhu,Congkai Xie,Zhen Li,Jianbo Yuan,Xinyao Wang,Qing Li,Shing-Chi Cheung,Shengyu Zhang,Fei Wu,Hongxia Yang
image Paper
Star
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang, Wenyu Lv, Tong Mo, Weiping Li
image Github
Paper
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man,De-An Huang,Guilin Liu,Shiwei Sheng,Shilong Liu,Liang-Yan Gui,Jan Kautz,Yu-Xiong Wang,Zhiding Yu
image Paper
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Yufei Zhan, Hongyin Zhao, Yousong Zhu, Shurong Zheng, Fan Yang, Ming Tang, Jinqiao Wang
image Paper
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Wangmeng Zuo, Lijuan Wang
image Paper
Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
Meng Cao, Haoze Zhao, Can Zhang, Xiaojun Chang, Ian Reid, Xiaodan Liang
image Paper
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen, Wei Wei, Xiaoye Qu, Yu Cheng
image Paper
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Jiwan Chung, Junhyeok Kim, Siyeol Kim, Jaeyoung Lee, Min Soo Kim, Youngjae Yu
image Paper
Visual Abstract Thinking Empowers Multimodal Reasoning
Dairu Liu, Ziyue Wang, Minyuan Ruan, Fuwen Luo, Chi Chen, Peng Li, Yang Liu
image Paper
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Mingyuan Wu, Jingcheng Yang, Jize Jiang, Meitang Li, Kaizhuo Yan, Hanchao Yu, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt
image Paper
DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie
image Paper
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
Shuang Zeng, Xinyuan Chang, Mengwei Xie, Xinran Liu, Yifan Bai, Zheng Pan, Mu Xu, Xing Wei
image Paper
Star
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving
Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Lei Zhang, Wangmeng Zuo
image Github
Paper
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Chenhao Zhang, Yazhe Niu
image Paper
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Alex Su, Haozhe Wang, Weimin Ren, Fangzhen Lin, Wenhu Chen
image Paper
Star
Training-Free Reasoning and Reflection in MLLMs
Hongchen Wei, Zhenzhong Chen
image Github
Paper
Star
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang
image Github
Paper
Star
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue
image Github
Paper
VLM-R3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Chaoya Jiang, Yongrui Heng, Wei Ye, Han Yang, Haiyang Xu, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
image Paper
Star
Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning
Siqu Ou, Hongcheng Liu, Pingjie Wang, Yusheng Liao, Chuan Xuan, Yanfeng Wang, Yu Wang
image Github
Paper
GRIT: Teaching MLLMs to Think with Images
Yue Fan, Xuehai He, Diji Yang, Kaizhi Zheng, Ching-Chen Kuo, Yuting Zheng, Sravana Jyothi Narayanaraju, Xinze Guan, Xin Eric Wang
image Paper
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
Sule Bai, Mingxing Li, Yong Liu, Jing Tang, Haoji Zhang, Lei Sun, Xiangxiang Chu, Yansong Tang
image Paper
Star
MMaDA: Multimodal Large Diffusion Language Models
Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang
image Github
Paper
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
Zihui Cheng, Qiguang Chen, Xiao Xu, Jiaqi Wang, Weiyun Wang, Hao Fei, Yidong Wang, Alex Jinpeng Wang, Zhi Chen, Wanxiang Che, Libo Qin
image Paper
Star
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
Jiaer Xia, Yuhang Zang, Peng Gao, Yixuan Li, Kaiyang Zhou
image Github
Paper
Star
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu, Tianyuan Qu, Zhisheng Zhong, Bohao Peng, Shu Liu, Bei Yu, Jiaya Jia
image Github
Paper
Star
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
Lingxiao Du, Fanqing Meng, Zongkai Liu, Zhixiang Zhou, Ping Luo, Qiaosheng Zhang, Wenqi Shao
image Github
Paper
CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
Hongbo Jin, Ruyang Liu, Wenhao Zhang, Guibo Luo, Ge Li
image Paper
Visual Planning: Let's Think Only with Images
Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang, Anna Korhonen, Ivan Vulić
image Paper
Star
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon
image Github
Paper
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Xiaokun Wang, Chris, Jiangbo Pei, Wei Shen, Yi Peng, Yunzhuo Hao, Weijie Qiu, Ai Jian, Tianyidan Xie, Xuchen Song, Yang Liu, Yahui Zhou
image Paper
Publish
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Ke Wang, Junting Pan, Linda Wei, Aojun Zhou, Weikang Shi, Zimu Lu, Han Xiao, Yunqiao Yang, Houxing Ren, Mingjie Zhan, Hongsheng Li
image Paper
Star
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng, Arushi Goel, Hakan Bilen
image Github
Paper
Star Publish
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging
Shiqi Chen, Jinghan Zhang, Tongyao Zhu, Wei Liu, Siyang Gao, Miao Xiong, Manling Li, Junxian He
image Github
Paper
Star
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris, Yichen Wei, Yi Peng, Xiaokun Wang, Weijie Qiu, Wei Shen, Tianyidan Xie, Jiangbo Pei, Jianhao Zhang, Yunzhuo Hao, Xuchen Song, Yang Liu, Yahui Zhou
image Github
Paper
Technical Report

Evaluation and Benchmarks

Metric

Title & Authors Introduction Links
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu
image Paper
Star
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang
image Github
Paper
Star
Non-Determinism of "Deterministic" LLM Settings
Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin
image Github
Paper
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez
image Paper
Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, et al.
image Paper
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan
image Paper
Star
Are Your LLMs Capable of Stable Reasoning?
Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen
image Github
Paper

Benchmarks and Datasets

Title & Authors Introduction Links
Star
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang
image Github
Paper
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu, Jiaxing Liu, Danlei Huang, Xiaoyu Li, Yifan Wang, Chen Chen, Liya Ma, Xuezhi Cao, Junxiao Xue
image Paper
Star
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett
image Github
Paper
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang
image Paper
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
Daniel A. P. Oliveira, David Martins de Matos
image Paper
Star
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Zhikai Wang, Jiashuo Sun, Wenqi Zhang, Zhiqiang Hu, Xin Li, Fan Wang, Deli Zhao
image Github
Paper
Star
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu
image Github
Paper
Star
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu
image Github
Paper
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
Yuan-Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna
image Paper
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu, Michael Saxon, Wenyue Hua, William Yang Wang
image Paper
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu
image Paper
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez
image Paper
Star
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji
image Github
Paper
Star
Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu
image Github
Paper
Star
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou
image Github
Paper
DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
Masoud Hashemi, Oluwanifemi Bamgbose, Sathwik Tejaswi Madhusudhan, Jishnu Sethumadhavan Nair, Aman Tiwari, Vikas Yadav
image Paper
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu
image Paper
Star
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao
image Github
Paper

Background Papers

Title & Authors Introduction Links
Star
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
MiniMax Team
image Github
Paper
Resa: Transparent Reasoning Models via SAEs
Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Deqing Fu, Willie Neiswanger
image Paper
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Yang Chen, Zhuolin Yang, Zihan Liu, Chankyu Lee, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping
image Paper
Star
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong, Minlie Huang
image Github
Paper
Star
RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi
image Github
Paper
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
Eric Hanchen Jiang, Haozheng Luo, Shengyuan Pang, Xiaomin Li, Zhenting Qi, Hengli Li, Cheng-Fu Yang, Zongyu Lin, Xinfeng Li, Hao Xu, Kai-Wei Chang, Ying Nian Wu
image Paper
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Safal Shrestha, Minwu Kim, Aadim Nepal, Anubhav Shrestha, Keith Ross
image Paper
Reasoning Models Better Express Their Confidence
Dongkeun Yoon, Seungone Kim, Sohee Yang, Sunkyoung Kim, Soyeon Kim, Yongil Kim, Eunbi Choi, Yireun Kim, Minjoon Seo
image Paper
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
Haolei Xu, Yuchen Yan, Yongliang Shen, Wenqi Zhang, Guiyang Hou, Shengpei Jiang, Kaitao Song, Weiming Lu, Jun Xiao, Yueting Zhuang
image Paper
Star
ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning
Jiaan Wang, Fandong Meng, Jie Zhou
image Github
Paper
Star
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
Qiguang Chen, Libo Qin, Jinhao Liu, Yue Liao, Jiaqi Wang, Jingxuan Zhou, Wanxiang Che
image Github
Paper
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang
image Paper
Star
Learning from Peers in Reasoning Models
Tongxu Luo, Wenyu Du, Jiaxi Bi, Stephen Chung, Zhengyang Tang, Hao Yang, Min Zhang, Benyou Wang
image Github
Paper
Star
Beyond Aha!: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li
image Github
Paper
Star
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su, Linjie Li, Mingyang Song, Yunzhuo Hao, Zhengyuan Yang, Jun Zhang, Guanjie Chen, Jiawei Gu, Juntao Li, Xiaoye Qu, Yu Cheng
image Github
Paper
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Seongyun Lee, Seungone Kim, Minju Seo, Yongrae Jo, Dongyoung Go, Hyeonbin Hwang, Jinho Park, Xiang Yue, Sean Welleck, Graham Neubig, Moontae Lee, Minjoon Seo
image Paper
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang, Zhiyang Chen, Zijun Wang, Tiancheng Li, Guo-Jun Qi
image Paper
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse, Tianlu Wang, Ping Yu, Xian Li, Jason Weston, Ilia Kulikov, Swarnadeep Saha
image Paper
INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
Prime Intellect Team, Sami Jaghouar, Justus Mattern, Jack Min Ong, Jannik Straube, Manveer Basra, Aaron Pazdera, Kushal Thaman, Matthew Di Ferrante, Felix Gabriel, Fares Obeid, Kemal Erdem, Michael Keiblinger, Johannes Hagemann
image Paper
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
Yunjie Ji, Xiaoyu Tian, Sitong Zhao, Haotian Wang, Shuaiting Chen, Yiping Peng, Han Zhao, Xiangang Li
image Paper
Star
Chain-of-Thought Tokens are Computer Program Variables
Fangwei Zhu, Peiyi Wang, Zhifang Sui
image Github
Paper
Star
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
image Github
Paper
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
Ramesh Manuvinakurike, Emanuel Moss, Elizabeth Anne Watkins, Saurav Sahay, Giuseppe Raffa, Lama Nachman
image Paper
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie
image Paper
Star
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou
image Github
Paper
Star
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Hasan Abed Al Kader Hammoud, Hani Itani, Bernard Ghanem
image Github
Paper
Star
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen
image Github
Paper
Star
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris, Yichen Wei, Yi Peng, Xiaokun Wang, Weijie Qiu, Wei Shen, Tianyidan Xie, Jiangbo Pei, Jianhao Zhang, Yunzhuo Hao, Xuchen Song, Yang Liu, Yahui Zhou
image Github
Paper
Star
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang
image Github
Paper
Publish
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou
image Paper
Star Publish
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
image Github
Paper
Star Publish
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler
image Github
Paper
Publish
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou
image Paper
Star Publish
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen
image Github
Paper
Star Publish
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models
Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang
image Github
Paper
Survey
Title & Authors Introduction Links
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal
image Paper
Star
From System 1 to System 2: A Survey of Reasoning Large Language Models
Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu
image Github
Paper

Competition

Acknowledgement

This repository is inspired by Awesome-Efficient-LLM

Citation

@article{feng2025efficient,
    title={Efficient Reasoning Models: A Survey},
    author={Feng, Sicheng and Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
    journal={arXiv preprint arXiv:2504.10903},
    year={2025},
}

About

[arXiv 2025] Efficient Reasoning Models: A Survey

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10

Languages