Skip to content

An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.

Notifications You must be signed in to change notification settings

thinkwee/AgentsMeetRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 

Repository files navigation

NOVER Logo

Base Framework Web GUI Tool
Game Code QA Environment

When LLM Agents Meet Reinforcement Learning

Agents meet RL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:

  • 🤖 The criteria for identifying an agent project is that it must have at least one of the following: multi-turn interactions or tool use.
  • ⚠️ This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!
  • 🤗 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. Feel free to submit your own projects anytime - we welcome contributions!

Some Enumeration:

  • Enumeration for Reward Type:
    • External Verifier: e.g., a compiler or math solver
    • Simple Rule: e.g., a LaTeX parser with exact match scoring
    • Model Based: e.g., a trained verifier LLM or reward LLM
    • Custom

Base Framework

Github Repo Stars Date Org Paper Link RL Framework RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
siiRL 2025.7 Shanghai Innovation Institute Paper Custom PPO/GRPO/CPGD/MARFT Multi Both Multi LLM/VLM/LLM-MAS PostTraining Model/Rule Planned
agent-lightning 2025.6 Microsoft Research paper veRL PPO/Custom/Automatic Prompt Optimization Multi Outcome Multi Calculator/SQL Model/External/Rule Yes
AReaL 2025.6 AntGroup/Tsinghua paper Custom PPO Both Outcome Both Math/Code External Yes
ROLL 2025.6 Alibaba paper Custom PPO/GRPO/Reinforce++/TOPR/RAFT++ Multi Both Multi Math/QA/Code/Alignment All Yes
MARTI 2025.5 Tsinghua -- Custom PPO/GRPO/REINFORCE++/TTRL Multi Both Multi Math All Yes
RL2 2025.4 Accio Custom Dr. GRPO/PPO/DPO Single Both Both QA/Dialogue Rule/Model/External Yes
verifiers 2025.3 Individual -- HuggingFace GRPO Multi Outcome Both Reasoning/Math/Code All Code
oat 2024.11 NUS/Sea AI paper Custom PPO/GRPO Single Outcome Multi Math/Alignment External No
veRL 2024.10 ByteDance paper veRL PPO/GRPO Single Outcome Both Math/QA/Reasoning/Search All Yes
OpenRLHF 2023.7 OpenRLHF paper OpenRLHF PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO Multi Both Both Dialogue/Chat/Completion Rule/Model/External Yes
trl 2019.11 HuggingFace -- trl PPO/GRPO/DPO Single Both Single QA Custom No

Search/Research/Web

Github Repo Stars Date Org Paper Link RL Framework RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ASearcher 2025.8 Ant Research RL Lab & Tsinghua University & UW paper RealHF/AReaL PPO/GRPO + Decoupled PPO Single Outcome Multi Math/Code/SearchQA External/Rule Yes
Kimi-Researcher 2025.6 Moonshot AI blog Custom REINFORCE Single Outcome Multi Research Outcome Search, Browse, Coding
TTI 2025.6 CMU paper Custom REINFORCE/BC Single Outcome Multi Web External Web Browsing
R-Search 2025.6 Individual -- veRL PPO/GRPO Single Both Multi QA/Search All Yes
R1-Searcher-plus 2025.5 RUC paper Custom Custom Single Outcome Multi Search Model Search
StepSearch 2025.5 SenseTime paper veRL PPO Single Process Multi QA Model Search
AutoRefine 2025.5 USTC paper veRL PPO/GRPO Multi Both Multi RAG QA Rule Search
ZeroSearch 2025.5 Alibaba paper veRL PPO/GRPO/REINFORCE Single Outcome Multi QA/Search Rule Yes
WebThinker 2025.4 RUC paper Custom DPO Single Outcome Multi Reasoning/QA/Research Model/External Web Browsing
DeepResearcher 2025.4 SJTU paper veRL PPO/GRPO Multi Outcome Multi Research All Yes
Search-R1 2025.3 UIUC/Google paper1, paper2 veRL PPO/GRPO Single Outcome Multi Search All Search
R1-Searcher 2025.3 RUC paper OpenRLHF PPO/DPO Single Both Multi Search All Yes
C-3PO 2025.2 Alibaba paper OpenRLHF PPO Multi Outcome Multi Search Model Yes
WebAgent 2025.1 Alibaba paper1, paper2 LLaMA-Factory DAPO Multi Process Multi Web Model Yes

GUI

Github Repo Stars Date Org Paper Link RL Framework RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Grounding-R1 2025.6 Salesforce blog trl GRPO Single Outcome Multi GUI Grounding Model Yes
AgentCPM-GUI 2025.6 OpenBMB/Tsinghua/RUC paper Huggingface GRPO Single Outcome Multi Mobile GUI Model Yes
ARPO 2025.5 CUHK/HKUST paper veRL GRPO Single Outcome Multi GUI External Computer Use
GUI-G1 2025.5 RUC paper TRL GRPO Single Outcome Single GUI Rule/External No
GUI-R1 2025.4 CAS/NUS paper veRL GRPO Single Outcome Multi GUI Rule No
UI-R1 2025.3 vivo/CUHK paper TRL GRPO Single Process Both GUI Rule Computer/Phone Use

Tool

Github Repo Stars Date Org Paper Link RL Framework RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
verl-tool 2025.6 TIGER-Lab X veRL PPO/GRPO Single Both Both Math/Code Rule/External Yes
Multi-Turn-RL-Agent 2025.5 University of Minnesota paper Custom GRPO Single Both Multi Tool-use/Math Rule/External Yes
Tool-N1 2025.5 NVIDIA paper veRL PPO Single Outcome Multi Math/Dialogue All Yes
Tool-Star 2025.5 RUC paper LLaMA-Factory PPO/DPO/ORPO/SimPO/KTO Single Outcome Multi Multi-modal/Tool Use/Dialogue Model/External Yes
RL-Factory 2025.5 Simple-Efficient model veRL GRPO Multi Both Multi Tool-use/NL2SQL All MCP
ReTool 2025.4 ByteDance paper veRL PPO Single Outcome Multi Math External Code
Agent-R1 2025.3 USTC -- veRL PPO/GRPO Single Both Multi Tool-use/QA Model Yes
ReCall 2025.3 BaiChuan paper veRL PPO/GRPO/RLOO/REINFORCE++/ReMax Single Outcome Multi Tool-use/Math/QA All Yes

TextGame

Github Repo Stars Date Org Paper Link RL Framework RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ARIA 2025.6 Fudan University paper Custom REINFORCE Both Process Multi Negotiation/Bargaining Other No
AMPO 2025.5 Tongyi Lab, Alibaba Paper veRL BC/AMPO(GRPO improvement) Multi Outcome Multi Social Interaction Model-based No
SPA-RL-Agent 2025.5 PolyU paper TRL PPO Single Process Multi Navigation/Web/TextGame Model No
Trinity-RFT 2025.5 Alibaba paper veRL PPO/GRPO Single Outcome Both Math/TextGame/Web All Yes
VAGEN 2025.3 RAGEN-AI paper veRL PPO/GRPO Single Both Multi TextGame/Navigation All Yes
ART 2025.3 OpenPipe paper TRL GRPO Multi Both Multi TextGame All Yes
OpenManus-RL 2025.3 UIUC/MetaGPT -- Custom PPO/DPO/GRPO Multi Outcome Multi TextGame All Yes
RAGEN 2025.1 RAGEN-AI paper veRL PPO/GRPO Single Both Multi TextGame All Yes

Code

Github Repo Stars Date Org Paper Link RL Framework RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MedAgentGym 2025.6 Emory/Georgia Tech paper Hugginface SFT/DPO/PPO/GRPO Single Outcome Multi Medical/Code External Yes
CURE 2025.6 University of Chicago/Princeton/ByteDance paper Huggingface PPO Single Outcome Single Code External No
verl-agent 2025.5 NTU/Skywork paper veRL PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++ Multi Both Multi Phone Use/Math/Code/Web/TextGame All Yes
MASLab 2025.5 MASWorks paper Custom NO RL Multi Outcome Multi Code/Math/Reasoning External Yes
Time-R1 2025.5 UIUC paper veRL PPO/GRPO/DPO Multi Outcome Multi Temporal All Code
ML-Agent 2025.5 MASWorks paper Custom Custom Single Process Multi Code All Yes
SkyRL 2025.4 NovaSky -- veRL PPO/GRPO Single Outcome Multi Math/Code All Code
digitalhuman 2025.4 Tencent paper veRL PPO/GRPO/ReMax/RLOO Multi Outcome Multi Empathy/Math/Code/MultimodalQA Rule/Model/External Yes
sweet_rl 2025.3 Meta/UCB paper OpenRLHF DPO Multi Process Multi Design/Code Model Web Browsing
rllm 2025.1 Berkeley Sky Computing Lab / BAIR / Together AI Notion Blog veRL PPO/GRPO Single Outcome Multi Code Edit External Yes
open-r1 2025.1 HuggingFace -- TRL GRPO Single Outcome Single Math/Code All Yes

QA(Reasoning/Math)

Github Repo Stars Date Org Paper Link RL Framework RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ARPO 2025.7 RUC, Kuaishou paper veRL GRPO Single Outcome Multi Math/Coding Model/Rule Yes
terminal-bench-rl 2025.7 Individual (Danau5tin) N/A rLLM GRPO Single Outcome Multi Coding/Terminal Model+External Verifier Yes
MOTIF 2025.6 University of Maryland paper trl GRPO Single Outcome Multi QA Rule No
MemAgent 2025.6 Bytedance, Tsinghua-SIA paper veRL PPO, GRPO, DPO Multi Outcome Multi Long-context QA Rule/Model/External Yes
cmriat/l0 2025.6 China Merchants Research Institute of Advanced Technology paper veRL PPO Multi Process Multi QA All Yes
agent-distillation 2025.5 KAIST paper Custom PPO Single Process Multi QA/Math External Yes
VDeepEyes 2025.5 Xiaohongshu/XJTU paper veRL PPO/GRPO Multi Process Multi VQA All Yes
EasyR1 2025.4 Individual repo1/paper2 veRL GRPO Single Process Multi Vision-Language Model Yes
AutoCoA 2025.3 BJTU paper veRL GRPO Multi Outcome Multi Reasoning/Math/QA All Yes
ToRL 2025.3 SJTU paper veRL GRPO Single Outcome Single Math Rule/External Yes
ReMA 2025.3 SJTU, UCL paper veRL PPO Multi Outcome Multi Math Rule No
Agentic-Reasoning 2025.2 Oxford paper Custom Custom Single Process Multi QA/Math External Web Browsing
SimpleTIR 2025.2 NTU, Bytedance Notion Blog veRL PPO/GRPO (with extensions) Single Outcome Multi Math, Coding All Yes
openrlhf_async_pipline 2024.5 OpenRLHF paper OpenRLHF PPO/REINFORCE++/DPO/RLOO Single Outcome Multi Dialogue/Reasoning/QA All No

Environment

Github Repo Stars Date Org Task
Mind2Web-2 2025.6 Ohio State University Web
gem 2025.5 Sea AI Lab Math/Code/Game/QA
MLE-Dojo 2025.5 GIT, Stanford MLE
atropos 2025.4 Nous Research Game/Code/Tool
InternBootcamp 2025.4 InternBootcamp Coding/QA/Game
reasoning-gym 2025.1 open-thought Algebra/Arithmetic/Computation/Cognition/Geometry/Graph/Logic/Game
llmgym 2025.1 tensorzero TextGame/Tool
debug-gym 2024.11 Microsoft Research Debugging/Game/Code
gym-llm 2024.8 Rodrigo Sánchez Molina Control/Game
AgentGym 2024.6 Fudan Web/Game
tau-bench 2024.6 Sierra Tool
appworld 2024.6 Stony Brook University Phone Use
android_world 2024.5 Google Research Phone Use
TheAgentCompany 2024.3 CMU, Duke Coding
LlamaGym 2024.3 Rohan Pandey Game
visualwebarena 2024.1 CMU Web
LMRL-Gym 2023.12 UC Berkeley Game
OSWorld 2023.10 HKU, CMU, Salesforce, Waterloo Computer Use
webarena 2023.7 CMU Web
AgentBench 2023.7 Tsinghua University Game/Web/QA/Tool
WebShop 2022.7 Princeton-NLP Web
ScienceWorld 2022.3 AllenAI TextGame/ScienceQA
alfworld 2020.10 Microsoft, CMU, UW Embodied
factorio-learning-environment 2021.6 JackHopkins Game
jericho 2018.10 Microsoft, GIT TextGame
TextWorld 2018.6 Microsoft Research TextGame

Under Review/Waiting for Open Source

Star History

Star History Chart

About

An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •