Awesome Uncertainty-based RL Papers

This repository targets researchers exploring recent advances in reinforcement learning that utilize uncertainty to improve the reasoning capabilities of large language models.

📋 Table of Contents

Awesome Uncertainty-based RL Papers

📜 History

Recent Updates

2025-07-23: Added "Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR"
2025-07-01: Added "SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning"
2025-06-20: Added “Reasoning with Exploration: An Entropy Perspective”
2025-06-20: Added “Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning”
2025-06-18: Added “Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models”
2025-06-14: Repository made public with comprehensive collection of uncertainty-based RL papers

How is Uncertainty used in RL?

🎯 Uncertainty as an Optimization Objective

Title	Method	Date	Metric	Task Domain	Code	Venue
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization	EMPO	04/2025	Semantic Entropy	Math & General	code	arXiv
SLOT: Sample-specific Language Model Optimization at Test-time	SLOT	05/2025	Token Entropy	Math	code	arXiv
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning	EM-FT, EM-INF	05/2025	Trajectory/Token Entropy	Math & Code	code	arXiv
One-shot Entropy Minimization	EM	05/2025	Token Entropy	Math, Code, Logic	code	arXiv
Maximizing Confidence Alone Improves Reasoning	RENT	05/2025	Token Entropy	Math	project page	arXiv
Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision	CSFT	06/2025	Self-Confidence	Math	-	arXiv

💰 Uncertainty as a Reward Signal

Title	Method	Date	Metric	Task Domain	Code	Venue
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning	EM-RL	05/2025	Trajectory/Token Entropy	Math & Code	code	arXiv
Learning to Reason without External Rewards	INTUITOR	05/2025	Self-Certainty	Math & Code	code	arXiv
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models	RLSC	06/2025	Self-Confidence	Math	-	arXiv
Consistent Paths Lead to Truth Self-Rewarding Reinforcement Learning for LLM Reasoning	CoVo	06/2025	Trajectory Features	Math	code	arXiv

🧭 Uncertainty as a Optimization Guide

Title	Method	Date	Metric	Task Domain	Code	Venue
SEED-GRPO: Semantic entropy enhanced GRPO for uncertainty-aware policy optimization	SEED-GRPO	05/2025	Semantic Entropy	Math	-	arXiv
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models	Clip-Cov, KL-Cov	05/2025	Policy Entropy	Math	code	arXiv
Reinforcing Video Reasoning with Focused Thinking	TW-GRPO	05/2025	Intra-Group Information Entropy	Video	code	arXiv
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning	Forking Tokens	06/2025	Token Entropy	Math	project page	arXiv
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning	MI peaks	06/2025	Mutual Information	Math	code	arXiv
Reasoning with Exploration: An Entropy Perspective	Entropy-Based Advantage Shaping	06/2025	Token Entropy	Math	-	arXiv
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning	SRFT	06/2025	Token Entropy	Math	poject page	arXiv
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR	Archer	07/2025	Token Entropy	Math & Code	code	arXiv

Contributing

We welcome everyone to contribute to this repository and help improve it. You can submit pull requests to add new papers, projects, and helpful materials, or to correct any errors that you may find. Please make sure that your pull requests follow the format in the tables above. Thank you for your valuable contributions!

🌟 Star this repo if you find it helpful! 🌟

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Uncertainty-based RL Papers

📋 Table of Contents

📜 History

Recent Updates

How is Uncertainty used in RL?

🎯 Uncertainty as an Optimization Objective

💰 Uncertainty as a Reward Signal

🧭 Uncertainty as a Optimization Guide

Contributing

🌟 Star History

About

Uh oh!

Packages

Contributors 2

falonss703/Awesome-Uncertainty-based-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Uncertainty-based RL Papers

📋 Table of Contents

📜 History

Recent Updates

How is Uncertainty used in RL?

🎯 Uncertainty as an Optimization Objective

💰 Uncertainty as a Reward Signal

🧭 Uncertainty as a Optimization Guide

Contributing

🌟 Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Contributors 2

Packages