v0.2.1
Overview
-
Agentic RL
1.1 The rollout model can now be accessed directly via the OpenAI API, reducing migration costs.
1.2 Supports general multi-step workflows without requiring concatenated experience data.
1.3 IntroducedAddStrategy
to facilitate group-based advantage/return calculations (experimental; will be integrated into the buffer module in future versions).
1.4 Added a ReAct Agent RL example based on the AgentScope framework.
1.5 Enhanced the Alfworld example into a general multi-step workflow. -
Async / Offline RL
2.1 RefactoredRunnerPool
toScheduler
, enabling asynchronous scheduling and management of multiple workflow runners.
2.2 Added a priority queue buffer to reduce idling caused by speed differences betweenExplorer
andTrainer
through experience sorting and reuse.
2.3 IntroducedSynchronizer
to manage model weight synchronization betweenExplorer
andTrainer
, supporting dynamic synchronization.
2.4 Added tutorials on using the Synchronizer. -
Add a benchmark tool for quick verification.
-
Added support for more RL algorithms (e.g., CHORD, DAPO, GSPO, RAFT).
-
Updated vllm to
0.10.0
and verl to0.4.1
. -
Fixed numerous bugs.
What's Changed
- Add a switch for progress bar in _HFBatchReader by @yanxi-chen in #126
- Add dapo reward by @hiyuchang in #114
- Add readme_zh by @hiyuchang in #127
- Fix a typo in readme by @hiyuchang in #128
- ModelWrapper automatically record Experience by @pan-x-c in #123
- Add continue_from_checkpoint by @hiyuchang in #129
- Merge verl v0.4.1 by @hiyuchang in #125
- Fix vllm nccl sync error by @pan-x-c in #132
- Add more unittest command by @pan-x-c in #133
- Add Step-wise Workflow by @pan-x-c in #130
- Add workflow and example for toolcall training using ToolAce dataset by @garyzhang99 in #134
- Rename data scripts for examples and refine toolcall example readme by @garyzhang99 in #137
- Add sft example by @hiyuchang in #138
- Fix
buffer.total_epochs
not working in SFT/DPO by @pan-x-c in #140 - Fix priority queue implementation and enhance testing by @pan-x-c in #135
- Update some details in tutorial by @hiyuchang in #144
- [exmaples] Updated the OPMD config. by @yaochaorui in #145
- Rollout openAI API compatible with vllm 0.8.5 by @pan-x-c in #146
- Standardize Experience and Sample Strategy by @pan-x-c in #141
- Add
fused_kernel_options
by @chenyushuo in #150 - Fix MATH readme by @hiyuchang in #151
- Calculate advantage in Explorer by @pan-x-c in #148
- Add
Synchronizer
by @chenyushuo in #131 - Add run_id for single-turn workflows by @hiyuchang in #152
- Bug fix for
Scheduler
andtorch.tensor
by @chenyushuo in #156 - Add Step-wise GRPO Advantage by @pan-x-c in #153
- Fix a bug in args_pass by @hiyuchang in #155
- Add decoupled evaluation workflow by @lingzhq in #142
- Add some training tricks for RLVR by @hiyuchang in #147
- GSPO-token policy loss function by @nkkarpov in #154
- Add tool call usage from our vllm model by @garyzhang99 in #161
- Refactor
Trainer.train
to async function by @chenyushuo in #164 - Distinguish repeatable/non-repeatable workflows by @hiyuchang in #162
- Add auto release for
synchronizer
by @chenyushuo in #166 - Fix multi-turn logprobs by @pan-x-c in #170
- Bug fix in Synchronizer by @chenyushuo in #171
- Update vLLM to 0.10.0 and add
max_model_len
by @hiyuchang in #172 - Add agentscope react multi-turn toolcalls example by @garyzhang99 in #165
- Add step-wise workflow test by @hiyuchang in #173
- Add MLFlow monitor by @pan-x-c in #179
- [Feat] Allow user to set
train_batch_size
by @hiyuchang in #177 - [example] Alfworld with General Multi-Step Workflow by @hiyuchang in #169
- feat: add RAFT alfworld example with reflection support by @shiweijiezero in #174
- Add general multi-step figure by @hiyuchang in #186
- Add benchmark by @chenyushuo in #178
- Fix
custom_fields
in experiences by @pan-x-c in #191 - Add Document for Synchronizer by @chenyushuo in #190
- Bug fix in load_plugins and explorer by @chenyushuo in #193
- Add CHORD algorithm example by @garyzhang99 in #194
- Fix problem in math_mix config by @garyzhang99 in #196
- Fix plugin_loader by @pan-x-c in #201
- Add unittest for mix by @hiyuchang in #200
- Set truncate_prompt_tokens in SamplingParams, silently truncating very large prompts and preventing vllm from throwing exception by @vadimkantorov in #198
- Support multi-version docs by @pan-x-c in #203
- Fix/fix agentscope tools example docs by @garyzhang99 in #205
- Auto-set pad_token_id when the default is None and not set in the buffer config. by @yaochaorui in #188
- Fix agentscope react example readme by @garyzhang99 in #206
- Add
max_prompt_tokens
by @chenyushuo in #202 - Release v0.2.1 by @pan-x-c in #208
New Contributors
- @yaochaorui made their first contribution in #145
- @nkkarpov made their first contribution in #154
- @vadimkantorov made their first contribution in #198
Full Changelog: v0.2.0...v0.2.1