This repository targets researchers exploring recent advances in reinforcement learning that utilize uncertainty to improve the reasoning capabilities of large language models.
- 2025-07-23: Added "Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR"
- 2025-07-01: Added "SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning"
- 2025-06-20: Added “Reasoning with Exploration: An Entropy Perspective”
- 2025-06-20: Added “Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning”
- 2025-06-18: Added “Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models”
- 2025-06-14: Repository made public with comprehensive collection of uncertainty-based RL papers
Title | Method | Date | Metric | Task Domain | Code | Venue |
---|---|---|---|---|---|---|
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | EMPO | 04/2025 | Semantic Entropy | Math & General | code | arXiv |
SLOT: Sample-specific Language Model Optimization at Test-time | SLOT | 05/2025 | Token Entropy | Math | code | arXiv |
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning | EM-FT, EM-INF | 05/2025 | Trajectory/Token Entropy | Math & Code | code | arXiv |
One-shot Entropy Minimization | EM | 05/2025 | Token Entropy | Math, Code, Logic | code | arXiv |
Maximizing Confidence Alone Improves Reasoning | RENT | 05/2025 | Token Entropy | Math | project page | arXiv |
Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision | CSFT | 06/2025 | Self-Confidence | Math | - | arXiv |
Title | Method | Date | Metric | Task Domain | Code | Venue |
---|---|---|---|---|---|---|
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning | EM-RL | 05/2025 | Trajectory/Token Entropy | Math & Code | code | arXiv |
Learning to Reason without External Rewards | INTUITOR | 05/2025 | Self-Certainty | Math & Code | code | arXiv |
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models | RLSC | 06/2025 | Self-Confidence | Math | - | arXiv |
Consistent Paths Lead to Truth Self-Rewarding Reinforcement Learning for LLM Reasoning | CoVo | 06/2025 | Trajectory Features | Math | code | arXiv |
We welcome everyone to contribute to this repository and help improve it. You can submit pull requests to add new papers, projects, and helpful materials, or to correct any errors that you may find. Please make sure that your pull requests follow the format in the tables above. Thank you for your valuable contributions!
🌟 Star this repo if you find it helpful! 🌟