Skip to content

falonss703/Awesome-Uncertainty-based-Reinforcement-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 

Repository files navigation

Awesome Uncertainty-based RL PapersAwesome

This repository targets researchers exploring recent advances in reinforcement learning that utilize uncertainty to improve the reasoning capabilities of large language models.

📋 Table of Contents

📜 History

Recent Updates

  • 2025-07-23: Added "Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR"
  • 2025-07-01: Added "SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning"
  • 2025-06-20: Added “Reasoning with Exploration: An Entropy Perspective”
  • 2025-06-20: Added “Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning”
  • 2025-06-18: Added “Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models”
  • 2025-06-14: Repository made public with comprehensive collection of uncertainty-based RL papers

How is Uncertainty used in RL?

🎯 Uncertainty as an Optimization Objective

Title Method Date Metric Task Domain Code Venue
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization EMPO 04/2025 Semantic Entropy Math & General code arXiv
SLOT: Sample-specific Language Model Optimization at Test-time SLOT 05/2025 Token Entropy Math code arXiv
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning EM-FT, EM-INF 05/2025 Trajectory/Token Entropy Math & Code code arXiv
One-shot Entropy Minimization EM 05/2025 Token Entropy Math, Code, Logic code arXiv
Maximizing Confidence Alone Improves Reasoning RENT 05/2025 Token Entropy Math project page arXiv
Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision CSFT 06/2025 Self-Confidence Math - arXiv

💰 Uncertainty as a Reward Signal

Title Method Date Metric Task Domain Code Venue
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning EM-RL 05/2025 Trajectory/Token Entropy Math & Code code arXiv
Learning to Reason without External Rewards INTUITOR 05/2025 Self-Certainty Math & Code code arXiv
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models RLSC 06/2025 Self-Confidence Math - arXiv
Consistent Paths Lead to Truth Self-Rewarding Reinforcement Learning for LLM Reasoning CoVo 06/2025 Trajectory Features Math code arXiv

🧭 Uncertainty as a Optimization Guide

Title Method Date Metric Task Domain Code Venue
SEED-GRPO: Semantic entropy enhanced GRPO for uncertainty-aware policy optimization SEED-GRPO 05/2025 Semantic Entropy Math - arXiv
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Clip-Cov, KL-Cov 05/2025 Policy Entropy Math code arXiv
Reinforcing Video Reasoning with Focused Thinking TW-GRPO 05/2025 Intra-Group Information Entropy Video code arXiv
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Forking Tokens 06/2025 Token Entropy Math project page arXiv
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning MI peaks 06/2025 Mutual Information Math code arXiv
Reasoning with Exploration: An Entropy Perspective Entropy-Based Advantage Shaping 06/2025 Token Entropy Math - arXiv
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning SRFT 06/2025 Token Entropy Math poject page arXiv
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Archer 07/2025 Token Entropy Math & Code code arXiv

Contributing

We welcome everyone to contribute to this repository and help improve it. You can submit pull requests to add new papers, projects, and helpful materials, or to correct any errors that you may find. Please make sure that your pull requests follow the format in the tables above. Thank you for your valuable contributions!

🌟 Star this repo if you find it helpful! 🌟

🌟 Star History

Star History Chart