Release v0.2.1 · modelscope/Trinity-RFT

Overview

Agentic RL
1.1 The rollout model can now be accessed directly via the OpenAI API, reducing migration costs.
1.2 Supports general multi-step workflows without requiring concatenated experience data.
1.3 Introduced AddStrategy to facilitate group-based advantage/return calculations (experimental; will be integrated into the buffer module in future versions).
1.4 Added a ReAct Agent RL example based on the AgentScope framework.
1.5 Enhanced the Alfworld example into a general multi-step workflow.
Async / Offline RL
2.1 Refactored RunnerPool to Scheduler, enabling asynchronous scheduling and management of multiple workflow runners.
2.2 Added a priority queue buffer to reduce idling caused by speed differences between Explorer and Trainer through experience sorting and reuse.
2.3 Introduced Synchronizer to manage model weight synchronization between Explorer and Trainer, supporting dynamic synchronization.
2.4 Added tutorials on using the Synchronizer.
Add a benchmark tool for quick verification.
Added support for more RL algorithms (e.g., CHORD， DAPO, GSPO, RAFT).
Updated vllm to 0.10.0 and verl to 0.4.1.
Fixed numerous bugs.

What's Changed

Add a switch for progress bar in _HFBatchReader by @yanxi-chen in #126
Add dapo reward by @hiyuchang in #114
Add readme_zh by @hiyuchang in #127
Fix a typo in readme by @hiyuchang in #128
ModelWrapper automatically record Experience by @pan-x-c in #123
Add continue_from_checkpoint by @hiyuchang in #129
Merge verl v0.4.1 by @hiyuchang in #125
Fix vllm nccl sync error by @pan-x-c in #132
Add more unittest command by @pan-x-c in #133
Add Step-wise Workflow by @pan-x-c in #130
Add workflow and example for toolcall training using ToolAce dataset by @garyzhang99 in #134
Rename data scripts for examples and refine toolcall example readme by @garyzhang99 in #137
Add sft example by @hiyuchang in #138
Fix buffer.total_epochs not working in SFT/DPO by @pan-x-c in #140
Fix priority queue implementation and enhance testing by @pan-x-c in #135
Update some details in tutorial by @hiyuchang in #144
[exmaples] Updated the OPMD config. by @yaochaorui in #145
Rollout openAI API compatible with vllm 0.8.5 by @pan-x-c in #146
Standardize Experience and Sample Strategy by @pan-x-c in #141
Add fused_kernel_options by @chenyushuo in #150
Fix MATH readme by @hiyuchang in #151
Calculate advantage in Explorer by @pan-x-c in #148
Add Synchronizer by @chenyushuo in #131
Add run_id for single-turn workflows by @hiyuchang in #152
Bug fix for Scheduler and torch.tensor by @chenyushuo in #156
Add Step-wise GRPO Advantage by @pan-x-c in #153
Fix a bug in args_pass by @hiyuchang in #155
Add decoupled evaluation workflow by @lingzhq in #142
Add some training tricks for RLVR by @hiyuchang in #147
GSPO-token policy loss function by @nkkarpov in #154
Add tool call usage from our vllm model by @garyzhang99 in #161
Refactor Trainer.train to async function by @chenyushuo in #164
Distinguish repeatable/non-repeatable workflows by @hiyuchang in #162
Add auto release for synchronizer by @chenyushuo in #166
Fix multi-turn logprobs by @pan-x-c in #170
Bug fix in Synchronizer by @chenyushuo in #171
Update vLLM to 0.10.0 and add max_model_len by @hiyuchang in #172
Add agentscope react multi-turn toolcalls example by @garyzhang99 in #165
Add step-wise workflow test by @hiyuchang in #173
Add MLFlow monitor by @pan-x-c in #179
[Feat] Allow user to set train_batch_size by @hiyuchang in #177
[example] Alfworld with General Multi-Step Workflow by @hiyuchang in #169
feat: add RAFT alfworld example with reflection support by @shiweijiezero in #174
Add general multi-step figure by @hiyuchang in #186
Add benchmark by @chenyushuo in #178
Fix custom_fields in experiences by @pan-x-c in #191
Add Document for Synchronizer by @chenyushuo in #190
Bug fix in load_plugins and explorer by @chenyushuo in #193
Add CHORD algorithm example by @garyzhang99 in #194
Fix problem in math_mix config by @garyzhang99 in #196
Fix plugin_loader by @pan-x-c in #201
Add unittest for mix by @hiyuchang in #200
Set truncate_prompt_tokens in SamplingParams, silently truncating very large prompts and preventing vllm from throwing exception by @vadimkantorov in #198
Support multi-version docs by @pan-x-c in #203
Fix/fix agentscope tools example docs by @garyzhang99 in #205
Auto-set pad_token_id when the default is None and not set in the buffer config. by @yaochaorui in #188
Fix agentscope react example readme by @garyzhang99 in #206
Add max_prompt_tokens by @chenyushuo in #202
Release v0.2.1 by @pan-x-c in #208

New Contributors

@yaochaorui made their first contribution in #145
@nkkarpov made their first contribution in #154
@vadimkantorov made their first contribution in #198

Full Changelog: v0.2.0...v0.2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.2.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

What's Changed

New Contributors

Contributors

Uh oh!