When LLM Agents Meet Reinforcement Learning

Agents meet RL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:

🤖 The criteria for identifying an agent project is that it must have at least one of the following: multi-turn interactions or tool use.
⚠️ This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!
🤗 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. Feel free to submit your own projects anytime - we welcome contributions!

Some Enumeration:

Enumeration for Reward Type:
- External Verifier: e.g., a compiler or math solver
- Simple Rule: e.g., a LaTeX parser with exact match scoring
- Model Based: e.g., a trained verifier LLM or reward LLM
- Custom

Base Framework

Github Repo	Date	Org	Paper Link	RL Framework	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
siiRL	2025.7	Shanghai Innovation Institute	Paper	Custom	PPO/GRPO/CPGD/MARFT	Multi	Both	Multi	LLM/VLM/LLM-MAS PostTraining	Model/Rule	Planned
agent-lightning	2025.6	Microsoft Research	paper	veRL	PPO/Custom/Automatic Prompt Optimization	Multi	Outcome	Multi	Calculator/SQL	Model/External/Rule	Yes
AReaL	2025.6	AntGroup/Tsinghua	paper	Custom	PPO	Both	Outcome	Both	Math/Code	External	Yes
ROLL	2025.6	Alibaba	paper	Custom	PPO/GRPO/Reinforce++/TOPR/RAFT++	Multi	Both	Multi	Math/QA/Code/Alignment	All	Yes
MARTI	2025.5	Tsinghua	--	Custom	PPO/GRPO/REINFORCE++/TTRL	Multi	Both	Multi	Math	All	Yes
RL2	2025.4	Accio	–	Custom	Dr. GRPO/PPO/DPO	Single	Both	Both	QA/Dialogue	Rule/Model/External	Yes
verifiers	2025.3	Individual	--	HuggingFace	GRPO	Multi	Outcome	Both	Reasoning/Math/Code	All	Code
oat	2024.11	NUS/Sea AI	paper	Custom	PPO/GRPO	Single	Outcome	Multi	Math/Alignment	External	No
veRL	2024.10	ByteDance	paper	veRL	PPO/GRPO	Single	Outcome	Both	Math/QA/Reasoning/Search	All	Yes
OpenRLHF	2023.7	OpenRLHF	paper	OpenRLHF	PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO	Multi	Both	Both	Dialogue/Chat/Completion	Rule/Model/External	Yes
trl	2019.11	HuggingFace	--	trl	PPO/GRPO/DPO	Single	Both	Single	QA	Custom	No

Search/Research/Web

Github Repo	Date	Org	Paper Link	RL Framework	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
ASearcher	2025.8	Ant Research RL Lab & Tsinghua University & UW	paper	RealHF/AReaL	PPO/GRPO + Decoupled PPO	Single	Outcome	Multi	Math/Code/SearchQA	External/Rule	Yes
Kimi-Researcher	2025.6	Moonshot AI	blog	Custom	REINFORCE	Single	Outcome	Multi	Research	Outcome	Search, Browse, Coding
TTI	2025.6	CMU	paper	Custom	REINFORCE/BC	Single	Outcome	Multi	Web	External	Web Browsing
R-Search	2025.6	Individual	--	veRL	PPO/GRPO	Single	Both	Multi	QA/Search	All	Yes
R1-Searcher-plus	2025.5	RUC	paper	Custom	Custom	Single	Outcome	Multi	Search	Model	Search
StepSearch	2025.5	SenseTime	paper	veRL	PPO	Single	Process	Multi	QA	Model	Search
AutoRefine	2025.5	USTC	paper	veRL	PPO/GRPO	Multi	Both	Multi	RAG QA	Rule	Search
ZeroSearch	2025.5	Alibaba	paper	veRL	PPO/GRPO/REINFORCE	Single	Outcome	Multi	QA/Search	Rule	Yes
WebThinker	2025.4	RUC	paper	Custom	DPO	Single	Outcome	Multi	Reasoning/QA/Research	Model/External	Web Browsing
DeepResearcher	2025.4	SJTU	paper	veRL	PPO/GRPO	Multi	Outcome	Multi	Research	All	Yes
Search-R1	2025.3	UIUC/Google	paper1, paper2	veRL	PPO/GRPO	Single	Outcome	Multi	Search	All	Search
R1-Searcher	2025.3	RUC	paper	OpenRLHF	PPO/DPO	Single	Both	Multi	Search	All	Yes
C-3PO	2025.2	Alibaba	paper	OpenRLHF	PPO	Multi	Outcome	Multi	Search	Model	Yes
WebAgent	2025.1	Alibaba	paper1, paper2	LLaMA-Factory	DAPO	Multi	Process	Multi	Web	Model	Yes

GUI

Github Repo	Date	Org	Paper Link	RL Framework	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
Grounding-R1	2025.6	Salesforce	blog	trl	GRPO	Single	Outcome	Multi	GUI Grounding	Model	Yes
AgentCPM-GUI	2025.6	OpenBMB/Tsinghua/RUC	paper	Huggingface	GRPO	Single	Outcome	Multi	Mobile GUI	Model	Yes
ARPO	2025.5	CUHK/HKUST	paper	veRL	GRPO	Single	Outcome	Multi	GUI	External	Computer Use
GUI-G1	2025.5	RUC	paper	TRL	GRPO	Single	Outcome	Single	GUI	Rule/External	No
GUI-R1	2025.4	CAS/NUS	paper	veRL	GRPO	Single	Outcome	Multi	GUI	Rule	No
UI-R1	2025.3	vivo/CUHK	paper	TRL	GRPO	Single	Process	Both	GUI	Rule	Computer/Phone Use

Tool

Github Repo	Date	Org	Paper Link	RL Framework	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
verl-tool	2025.6	TIGER-Lab	X	veRL	PPO/GRPO	Single	Both	Both	Math/Code	Rule/External	Yes
Multi-Turn-RL-Agent	2025.5	University of Minnesota	paper	Custom	GRPO	Single	Both	Multi	Tool-use/Math	Rule/External	Yes
Tool-N1	2025.5	NVIDIA	paper	veRL	PPO	Single	Outcome	Multi	Math/Dialogue	All	Yes
Tool-Star	2025.5	RUC	paper	LLaMA-Factory	PPO/DPO/ORPO/SimPO/KTO	Single	Outcome	Multi	Multi-modal/Tool Use/Dialogue	Model/External	Yes
RL-Factory	2025.5	Simple-Efficient	model	veRL	GRPO	Multi	Both	Multi	Tool-use/NL2SQL	All	MCP
ReTool	2025.4	ByteDance	paper	veRL	PPO	Single	Outcome	Multi	Math	External	Code
Agent-R1	2025.3	USTC	--	veRL	PPO/GRPO	Single	Both	Multi	Tool-use/QA	Model	Yes
ReCall	2025.3	BaiChuan	paper	veRL	PPO/GRPO/RLOO/REINFORCE++/ReMax	Single	Outcome	Multi	Tool-use/Math/QA	All	Yes

TextGame

Github Repo	Date	Org	Paper Link	RL Framework	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
ARIA	2025.6	Fudan University	paper	Custom	REINFORCE	Both	Process	Multi	Negotiation/Bargaining	Other	No
AMPO	2025.5	Tongyi Lab, Alibaba	Paper	veRL	BC/AMPO(GRPO improvement)	Multi	Outcome	Multi	Social Interaction	Model-based	No
SPA-RL-Agent	2025.5	PolyU	paper	TRL	PPO	Single	Process	Multi	Navigation/Web/TextGame	Model	No
Trinity-RFT	2025.5	Alibaba	paper	veRL	PPO/GRPO	Single	Outcome	Both	Math/TextGame/Web	All	Yes
VAGEN	2025.3	RAGEN-AI	paper	veRL	PPO/GRPO	Single	Both	Multi	TextGame/Navigation	All	Yes
ART	2025.3	OpenPipe	paper	TRL	GRPO	Multi	Both	Multi	TextGame	All	Yes
OpenManus-RL	2025.3	UIUC/MetaGPT	--	Custom	PPO/DPO/GRPO	Multi	Outcome	Multi	TextGame	All	Yes
RAGEN	2025.1	RAGEN-AI	paper	veRL	PPO/GRPO	Single	Both	Multi	TextGame	All	Yes

Code

Github Repo	Date	Org	Paper Link	RL Framework	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
MedAgentGym	2025.6	Emory/Georgia Tech	paper	Hugginface	SFT/DPO/PPO/GRPO	Single	Outcome	Multi	Medical/Code	External	Yes
CURE	2025.6	University of Chicago/Princeton/ByteDance	paper	Huggingface	PPO	Single	Outcome	Single	Code	External	No
verl-agent	2025.5	NTU/Skywork	paper	veRL	PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++	Multi	Both	Multi	Phone Use/Math/Code/Web/TextGame	All	Yes
MASLab	2025.5	MASWorks	paper	Custom	NO RL	Multi	Outcome	Multi	Code/Math/Reasoning	External	Yes
Time-R1	2025.5	UIUC	paper	veRL	PPO/GRPO/DPO	Multi	Outcome	Multi	Temporal	All	Code
ML-Agent	2025.5	MASWorks	paper	Custom	Custom	Single	Process	Multi	Code	All	Yes
SkyRL	2025.4	NovaSky	--	veRL	PPO/GRPO	Single	Outcome	Multi	Math/Code	All	Code
digitalhuman	2025.4	Tencent	paper	veRL	PPO/GRPO/ReMax/RLOO	Multi	Outcome	Multi	Empathy/Math/Code/MultimodalQA	Rule/Model/External	Yes
sweet_rl	2025.3	Meta/UCB	paper	OpenRLHF	DPO	Multi	Process	Multi	Design/Code	Model	Web Browsing
rllm	2025.1	Berkeley Sky Computing Lab / BAIR / Together AI	Notion Blog	veRL	PPO/GRPO	Single	Outcome	Multi	Code Edit	External	Yes
open-r1	2025.1	HuggingFace	--	TRL	GRPO	Single	Outcome	Single	Math/Code	All	Yes

QA(Reasoning/Math)

Github Repo	Date	Org	Paper Link	RL Framework	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
ARPO	2025.7	RUC, Kuaishou	paper	veRL	GRPO	Single	Outcome	Multi	Math/Coding	Model/Rule	Yes
terminal-bench-rl	2025.7	Individual (Danau5tin)	N/A	rLLM	GRPO	Single	Outcome	Multi	Coding/Terminal	Model+External Verifier	Yes
MOTIF	2025.6	University of Maryland	paper	trl	GRPO	Single	Outcome	Multi	QA	Rule	No
MemAgent	2025.6	Bytedance, Tsinghua-SIA	paper	veRL	PPO, GRPO, DPO	Multi	Outcome	Multi	Long-context QA	Rule/Model/External	Yes
cmriat/l0	2025.6	China Merchants Research Institute of Advanced Technology	paper	veRL	PPO	Multi	Process	Multi	QA	All	Yes
agent-distillation	2025.5	KAIST	paper	Custom	PPO	Single	Process	Multi	QA/Math	External	Yes
VDeepEyes	2025.5	Xiaohongshu/XJTU	paper	veRL	PPO/GRPO	Multi	Process	Multi	VQA	All	Yes
EasyR1	2025.4	Individual	repo1/paper2	veRL	GRPO	Single	Process	Multi	Vision-Language	Model	Yes
AutoCoA	2025.3	BJTU	paper	veRL	GRPO	Multi	Outcome	Multi	Reasoning/Math/QA	All	Yes
ToRL	2025.3	SJTU	paper	veRL	GRPO	Single	Outcome	Single	Math	Rule/External	Yes
ReMA	2025.3	SJTU, UCL	paper	veRL	PPO	Multi	Outcome	Multi	Math	Rule	No
Agentic-Reasoning	2025.2	Oxford	paper	Custom	Custom	Single	Process	Multi	QA/Math	External	Web Browsing
SimpleTIR	2025.2	NTU, Bytedance	Notion Blog	veRL	PPO/GRPO (with extensions)	Single	Outcome	Multi	Math, Coding	All	Yes
openrlhf_async_pipline	2024.5	OpenRLHF	paper	OpenRLHF	PPO/REINFORCE++/DPO/RLOO	Single	Outcome	Multi	Dialogue/Reasoning/QA	All	No

Environment

Github Repo	Date	Org	Task
Mind2Web-2	2025.6	Ohio State University	Web
gem	2025.5	Sea AI Lab	Math/Code/Game/QA
MLE-Dojo	2025.5	GIT, Stanford	MLE
atropos	2025.4	Nous Research	Game/Code/Tool
InternBootcamp	2025.4	InternBootcamp	Coding/QA/Game
reasoning-gym	2025.1	open-thought	Algebra/Arithmetic/Computation/Cognition/Geometry/Graph/Logic/Game
llmgym	2025.1	tensorzero	TextGame/Tool
debug-gym	2024.11	Microsoft Research	Debugging/Game/Code
gym-llm	2024.8	Rodrigo Sánchez Molina	Control/Game
AgentGym	2024.6	Fudan	Web/Game
tau-bench	2024.6	Sierra	Tool
appworld	2024.6	Stony Brook University	Phone Use
android_world	2024.5	Google Research	Phone Use
TheAgentCompany	2024.3	CMU, Duke	Coding
LlamaGym	2024.3	Rohan Pandey	Game
visualwebarena	2024.1	CMU	Web
LMRL-Gym	2023.12	UC Berkeley	Game
OSWorld	2023.10	HKU, CMU, Salesforce, Waterloo	Computer Use
webarena	2023.7	CMU	Web
AgentBench	2023.7	Tsinghua University	Game/Web/QA/Tool
WebShop	2022.7	Princeton-NLP	Web
ScienceWorld	2022.3	AllenAI	TextGame/ScienceQA
alfworld	2020.10	Microsoft, CMU, UW	Embodied
factorio-learning-environment	2021.6	JackHopkins	Game
jericho	2018.10	Microsoft, GIT	TextGame
TextWorld	2018.6	Microsoft Research	TextGame

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

When LLM Agents Meet Reinforcement Learning

Base Framework

Search/Research/Web

GUI

Tool

TextGame

Code

QA(Reasoning/Math)

Environment

Under Review/Waiting for Open Source

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

thinkwee/AgentsMeetRL

Folders and files

Latest commit

History

Repository files navigation

When LLM Agents Meet Reinforcement Learning

Base Framework

Search/Research/Web

GUI

Tool

TextGame

Code

QA(Reasoning/Math)

Environment

Under Review/Waiting for Open Source

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages