Awesome RL for Agents

A curated list of reinforcement learning (RL) for agents.

This list collects papers, tools, and demos that demonstrate how reinforcement learning can be applied to train or tune LLM/MLLM agents, with a focus on research-driven, computer-using, and tool-integrated agent behaviors.

📚 Papers & Research

Survey & Review

Deep Research Agents: A Systematic Examination And Roadmap [Preprint'25] [AwesomeList]

RL for Computer-using Agents

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay [Preprint'25] [Code]
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners [Preprint'25] [Code]
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning [Preprint'25]
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning [Preprint'25] [Code]
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [Preprint'25] [Code]
AutoWebGLM: A Large Language Model-based Web Navigating Agent [KDD'24] [Preprint'24] [Code]

RL for Research Agents

WebShaper: Towards Autonomous Information Seeking Agency [Preprint'25] [Code]
WebSailor: Navigating Super-human Reasoning for Web Agent [Preprint'25] [Code]
MMSearch-R1: Incentivizing LMMs to Search [Preprint'25] [Code]
Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities [Blog]
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning [Preprint'25] [Code]
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning [Preprint'25] [Code]
ZeroSearch: Incentivize the Search Capability of LLMs without Searching [Preprint'25] [Code]
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [Preprint'25] [Code]
ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning [Preprint'25] [Code]
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [Preprint'25] [Code]
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Preprint'25] [Code]
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research [Preprint'25] [Code]

RL for Tool-using Problem Solver

Self-Challenging Language Model Agents [Preprint'25]
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection [Preprint'25] [Code]
OTC: Optimal Tool Calls via Reinforcement Learning [Preprint'25]
ToolRL: Reward is All Tool Learning Needs [Preprint'25] [Code]
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs [Preprint'25]
Agent models: Internalizing Chain-of-Action Generation into Reasoning models [Preprint'25] [Code]
TORL: Scaling Tool-Integrated RL [Preprint'25] [Code]

RL for Agent Memory

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent [Preprint'25] [Code]
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [Preprint'25]

Reinforcement Learning Scaling

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [Preprint'25] [Model]
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [Preprint'25]
o3 & o4-mini: Introducing OpenAI o3 and o4-mini [Blog]
Skywork-OR1 (Open Reasoner 1) [Blog] [Code]
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [Preprint'25]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale [Preprint'25] [Code]
LIMR: Less is More for RL Scaling [Preprint'25] [Code]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Preprint'25]
Kimi k1.5: Scaling Reinforcement Learning with LLMs [Preprint'25]

Others

UFO: A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning [Preprint'25] [Code]
MPO: Boosting LLM Agents with Meta Plan Optimization [Preprint'25] [Code]

🕹 Benchmarks

xbench: Tracking Agents Productivity Scaling With Profession-Aligned Real-World Evaluations [Preprint'25] [Website]
BrowseComp-ZH: Benchmarking the Web Browsing Ability of Large Language Models in Chinese [Preprint'25] [Code]
BrowseComp: a benchmark for browsing agents [Blog] [Paper] [Code]
Computer Agent Arena: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks [Platform] [Code]
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use [Paper] [Code]
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [NeurIPS'24] [Code]
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [ACL'24] [Code]

🧪 Demos & Projects

RL-based LLM agent tuning

SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [Blog] [Code]
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning [Code]
VAGEN: Training VLM Agents with Multi-Turn Reinforcement Learning [Code]
OpenManus-RL [Code] & OpenManus [Code]
RAGEN: Training Agents by Reinforcing Reasoning [Code]

RL-based LLM tuning

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model [Preprint'25] [Code]
simple_GRPO [Code]

MCP Agents

mcp-agent [Code]
Agent2Agent (A2A) protocol [Code]

🧰 Toolkits & Frameworks

ROLL: Reinforcement Learning Optimization for Large-Scale Learning [Code]
verl: Volcano Engine Reinforcement Learning for LLM [Code]

📄 Tutorials & Blog Posts

Introducing ChatGPT agent: bridging research and action [Blog]
Context Engineering [Github]
The Second Half [Blog]

🔗 Related Awesome Lists

Awesome Deep Research Agent [List] - covering deep research agents and benchmark results
Awesome-Agent-RL [List] - covering RL for research agents
awesome-ml-agents [List] - covering rl and agents before 2023

🤝 Contributing

Contributions are warmly welcome!

If you know a paper, tool, environment, or demo relevant to RL for Agents, feel free to open a pull request.

Guidelines:

Make sure the resource is publicly accessible and active.
Use the same format as existing entries: - **Name**: Title [Paper](link) [Code](link) – short description (optional).
Add entries under the most appropriate section.
Avoid duplicates or resources that are already well-covered elsewhere.

We aim to keep this list high-quality, practical, and focused. Thank you for helping improve it! ✨

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome RL for Agents

Table of Contents

📚 Papers & Research

Survey & Review

RL for Computer-using Agents

RL for Research Agents

RL for Tool-using Problem Solver

RL for Agent Memory

Reinforcement Learning Scaling

Others

🕹 Benchmarks

🧪 Demos & Projects

RL-based LLM agent tuning

RL-based LLM tuning

MCP Agents

🧰 Toolkits & Frameworks

📄 Tutorials & Blog Posts

🔗 Related Awesome Lists

🤝 Contributing

Guidelines:

About

Uh oh!

Releases

Packages

License

Necolizer/awesome-rl-for-agents

Folders and files

Latest commit

History

Repository files navigation

Awesome RL for Agents

Table of Contents

📚 Papers & Research

Survey & Review

RL for Computer-using Agents

RL for Research Agents

RL for Tool-using Problem Solver

RL for Agent Memory

Reinforcement Learning Scaling

Others

🕹 Benchmarks

🧪 Demos & Projects

RL-based LLM agent tuning

RL-based LLM tuning

MCP Agents

🧰 Toolkits & Frameworks

📄 Tutorials & Blog Posts

🔗 Related Awesome Lists

🤝 Contributing

Guidelines:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages