Skip to content

Necolizer/awesome-rl-for-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 

Repository files navigation

Awesome RL for Agents Awesome

A curated list of reinforcement learning (RL) for agents.

This list collects papers, tools, and demos that demonstrate how reinforcement learning can be applied to train or tune LLM/MLLM agents, with a focus on research-driven, computer-using, and tool-integrated agent behaviors.


Table of Contents


📚 Papers & Research

Survey & Review

RL for Computer-using Agents

  • ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay [Preprint'25] [Code]
  • InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners [Preprint'25] [Code]
  • Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning [Preprint'25]
  • UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning [Preprint'25] [Code]
  • Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [Preprint'25] [Code]
  • AutoWebGLM: A Large Language Model-based Web Navigating Agent [KDD'24] [Preprint'24] [Code]

RL for Research Agents

  • WebShaper: Towards Autonomous Information Seeking Agency [Preprint'25] [Code]
  • WebSailor: Navigating Super-human Reasoning for Web Agent [Preprint'25] [Code]
  • MMSearch-R1: Incentivizing LMMs to Search [Preprint'25] [Code]
  • Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities [Blog]
  • R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning [Preprint'25] [Code]
  • R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning [Preprint'25] [Code]
  • ZeroSearch: Incentivize the Search Capability of LLMs without Searching [Preprint'25] [Code]
  • DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [Preprint'25] [Code]
  • ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning [Preprint'25] [Code]
  • Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [Preprint'25] [Code]
  • R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Preprint'25] [Code]
  • Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research [Preprint'25] [Code]

RL for Tool-using Problem Solver

RL for Agent Memory

  • MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent [Preprint'25] [Code]
  • MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [Preprint'25]

Reinforcement Learning Scaling

  • Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [Preprint'25] [Model]
  • A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [Preprint'25]
  • o3 & o4-mini: Introducing OpenAI o3 and o4-mini [Blog]
  • Skywork-OR1 (Open Reasoner 1) [Blog] [Code]
  • VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [Preprint'25]
  • DAPO: An Open-Source LLM Reinforcement Learning System at Scale [Preprint'25] [Code]
  • LIMR: Less is More for RL Scaling [Preprint'25] [Code]
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Preprint'25]
  • Kimi k1.5: Scaling Reinforcement Learning with LLMs [Preprint'25]

Others

🕹 Benchmarks

  • xbench: Tracking Agents Productivity Scaling With Profession-Aligned Real-World Evaluations [Preprint'25] [Website]
  • BrowseComp-ZH: Benchmarking the Web Browsing Ability of Large Language Models in Chinese [Preprint'25] [Code]
  • BrowseComp: a benchmark for browsing agents [Blog] [Paper] [Code]
  • Computer Agent Arena: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks [Platform] [Code]
  • ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use [Paper] [Code]
  • OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [NeurIPS'24] [Code]
  • SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [ACL'24] [Code]

🧪 Demos & Projects

RL-based LLM agent tuning

  • SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [Blog] [Code]
  • Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning [Code]
  • VAGEN: Training VLM Agents with Multi-Turn Reinforcement Learning [Code]
  • OpenManus-RL [Code] & OpenManus [Code]
  • RAGEN: Training Agents by Reinforcing Reasoning [Code]

RL-based LLM tuning

  • Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model [Preprint'25] [Code]
  • simple_GRPO [Code]

MCP Agents

🧰 Toolkits & Frameworks

  • ROLL: Reinforcement Learning Optimization for Large-Scale Learning [Code]
  • verl: Volcano Engine Reinforcement Learning for LLM [Code]

📄 Tutorials & Blog Posts

  • Introducing ChatGPT agent: bridging research and action [Blog]
  • Context Engineering [Github]
  • The Second Half [Blog]

🔗 Related Awesome Lists

  • Awesome Deep Research Agent [List] - covering deep research agents and benchmark results
  • Awesome-Agent-RL [List] - covering RL for research agents
  • awesome-ml-agents [List] - covering rl and agents before 2023

🤝 Contributing

Contributions are warmly welcome!

If you know a paper, tool, environment, or demo relevant to RL for Agents, feel free to open a pull request.

Guidelines:

  • Make sure the resource is publicly accessible and active.
  • Use the same format as existing entries: - **Name**: Title [Paper](link) [Code](link) – short description (optional).
  • Add entries under the most appropriate section.
  • Avoid duplicates or resources that are already well-covered elsewhere.

We aim to keep this list high-quality, practical, and focused. Thank you for helping improve it! ✨

About

A curated list of reinforcement learning (RL) for agents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published