A curated list of reinforcement learning (RL) for agents.
This list collects papers, tools, and demos that demonstrate how reinforcement learning can be applied to train or tune LLM/MLLM agents, with a focus on research-driven, computer-using, and tool-integrated agent behaviors.
- 📚 Papers & Research
- 🕹️ Benchmarks
- 🧪 Demos & Projects
- 🧰 Toolkits & Frameworks
- 📄 Tutorials & Blog Posts
- 🔗 Related Awesome Lists
- 🤝 Contributing
- Deep Research Agents: A Systematic Examination And Roadmap [Preprint'25] [AwesomeList]
- ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay [Preprint'25] [Code]
- InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners [Preprint'25] [Code]
- Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning [Preprint'25]
- UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning [Preprint'25] [Code]
- Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [Preprint'25] [Code]
- AutoWebGLM: A Large Language Model-based Web Navigating Agent [KDD'24] [Preprint'24] [Code]
- WebShaper: Towards Autonomous Information Seeking Agency [Preprint'25] [Code]
- WebSailor: Navigating Super-human Reasoning for Web Agent [Preprint'25] [Code]
- MMSearch-R1: Incentivizing LMMs to Search [Preprint'25] [Code]
- Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities [Blog]
- R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning [Preprint'25] [Code]
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning [Preprint'25] [Code]
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching [Preprint'25] [Code]
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [Preprint'25] [Code]
- ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning [Preprint'25] [Code]
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [Preprint'25] [Code]
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Preprint'25] [Code]
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research [Preprint'25] [Code]
- Self-Challenging Language Model Agents [Preprint'25]
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection [Preprint'25] [Code]
- OTC: Optimal Tool Calls via Reinforcement Learning [Preprint'25]
- ToolRL: Reward is All Tool Learning Needs [Preprint'25] [Code]
- ReTool: Reinforcement Learning for Strategic Tool Use in LLMs [Preprint'25]
- Agent models: Internalizing Chain-of-Action Generation into Reasoning models [Preprint'25] [Code]
- TORL: Scaling Tool-Integrated RL [Preprint'25] [Code]
- MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent [Preprint'25] [Code]
- MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [Preprint'25]
- Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [Preprint'25] [Model]
- A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [Preprint'25]
- o3 & o4-mini: Introducing OpenAI o3 and o4-mini [Blog]
- Skywork-OR1 (Open Reasoner 1) [Blog] [Code]
- VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [Preprint'25]
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale [Preprint'25] [Code]
- LIMR: Less is More for RL Scaling [Preprint'25] [Code]
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Preprint'25]
- Kimi k1.5: Scaling Reinforcement Learning with LLMs [Preprint'25]
- UFO: A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning [Preprint'25] [Code]
- MPO: Boosting LLM Agents with Meta Plan Optimization [Preprint'25] [Code]
- xbench: Tracking Agents Productivity Scaling With Profession-Aligned Real-World Evaluations [Preprint'25] [Website]
- BrowseComp-ZH: Benchmarking the Web Browsing Ability of Large Language Models in Chinese [Preprint'25] [Code]
- BrowseComp: a benchmark for browsing agents [Blog] [Paper] [Code]
- Computer Agent Arena: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks [Platform] [Code]
- ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use [Paper] [Code]
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [NeurIPS'24] [Code]
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [ACL'24] [Code]
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [Blog] [Code]
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning [Code]
- VAGEN: Training VLM Agents with Multi-Turn Reinforcement Learning [Code]
- OpenManus-RL [Code] & OpenManus [Code]
- RAGEN: Training Agents by Reinforcing Reasoning [Code]
- Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model [Preprint'25] [Code]
- simple_GRPO [Code]
- ROLL: Reinforcement Learning Optimization for Large-Scale Learning [Code]
- verl: Volcano Engine Reinforcement Learning for LLM [Code]
- Introducing ChatGPT agent: bridging research and action [Blog]
- Context Engineering [Github]
- The Second Half [Blog]
- Awesome Deep Research Agent [List] - covering deep research agents and benchmark results
- Awesome-Agent-RL [List] - covering RL for research agents
- awesome-ml-agents [List] - covering rl and agents before 2023
Contributions are warmly welcome!
If you know a paper, tool, environment, or demo relevant to RL for Agents, feel free to open a pull request.
- Make sure the resource is publicly accessible and active.
- Use the same format as existing entries:
- **Name**: Title [Paper](link) [Code](link) – short description (optional).
- Add entries under the most appropriate section.
- Avoid duplicates or resources that are already well-covered elsewhere.
We aim to keep this list high-quality, practical, and focused. Thank you for helping improve it! ✨