Releases · modelscope/Trinity-RFT

17 Oct 07:47

pan-x-c

v0.3.1

c04b993

v0.3.1 Latest

Latest

Overview

Agentic RL

Add more agentic RL examples using agent frameworks (e.g. AgentScope)
Provide Debug mode for workflow developers
Add examples for RL in non-verifiable domain: trainable RULER reward, rubric-as-reward

Framework Enhancement

Support multi-stage training
Support using environment variables in configuration file
Support LoRA
Enhance checkpoint saving process
Enhance experience replay mechanism for priority queue buffer
Add algorithms: group-relative REINFORCE variants
Update vLLM to 0.10.2

Documentation

Add Chinese Docs
Rewrite Developer Guide

What's Changed

Support new task ordering methods by @HYLcool in #265
Fix problem in get_openr1_dataset.py in chord example by @garyzhang99 in #266
Replace /PATH/TO to ${oc.env:TRINITY_XXX} by @chenyushuo in #267
Support multi-stage training by @pan-x-c in #268
[Example] Policy model as its own reward model by @hiyuchang in #270
Dev/update agentscope react example version by @garyzhang99 in #275
Normalize Trainer by @pan-x-c in #271
Refactor workflow to async workflow by @chenyushuo in #276
Support Chinese Docs by @hiyuchang in #277
Support vLLM 0.10.2 by @pan-x-c in #278
AgentScopeV1.0 WebSearch Workflow(simple react + search api) by @garyzhang99 in #264
Add enable_activation_offload configuration option by @nkkarpov in #281
Update data-juicer version in toml by @chenyushuo in #286
Improvement in config by @chenyushuo in #288
Add loss-agg-mode for policy loss by @hiyuchang in #294
Explorer provides OpenAI API compatible inference service by @pan-x-c in #289
Fix absmethod in workflow by @chenyushuo in #297
Add LoRA mode by @hiyuchang in #291
Enhance support for multi-modal models by @pan-x-c in #298
Group-relative REINFORCE Families by @yaochaorui in #292
Optimize Installation and Development Doc by @pan-x-c in #301
Update Readme by @pan-x-c in #302
Refactor checkpoint save by @chenyushuo in #299
Refactor AgentScope ReAct Agent workflow example by @pan-x-c in #303
Update Workflow Developement Tutorial by @pan-x-c in #310
Update chord with tooluse example by @garyzhang99 in #313
Debug Mode for workflow developers by @pan-x-c in #314
[BUGFIX]Fix tokenizer bug when getting action masks with enable_thinking arguments by @garyzhang99 in #316
Add task_count by @hiyuchang in #307
Update FAQ by @hiyuchang in #320
Refactoring EmailSearchWorkflow and EmailSearchAgent to adapt to the latest version of Agentscope. by @chenyushuo in #321
Non-verifiable Medicine QA Task by @hiyuchang in #317
Add batch level std calculation by @garyzhang99 in #311
Implement serial saving by @chenyushuo in #322
Enhance experience replay for priority queue buffer by @yanxi-chen in #306
Fix example config typo by @garyzhang99 in #323
Simplify Config by @pan-x-c in #325
Update config from ray cluster by @hiyuchang in #324
Support AgentScope Workflow Function by @pan-x-c in #327

Full Changelog: v0.3.0...v0.3.1

Contributors

nkkarpov, HYLcool, and 6 other contributors

Assets 2

09 Sep 10:14

pan-x-c

v0.3.0

7d2323f

v0.3.0

Overview

Framework Development

Buffer Module

Use Operator Interface to replace the original AddStrategy. Operator can perform various transformations on experience data in a pipeline manner. [Breaking Change]
Add TaskPipeline and ExperiencePipeline for task and experience data preprocessing.
Support calling Data-Juicer services in both TaskPipeline and ExperiencePipeline, and resolve some dependency conflicts.
Refactor SQL/FILE storage. SQL can store SFT/DPO/Rollout/Experience data. SQL and FILE support parsing multi-turn SFT data with tools. [Breaking Change]

Trainer Module

Support FSDP2 backend
Support Megatron backend
Support Qwen2.5 VL muti-modal models [Experimental]

Explorer Module

Support Qwen2.5 VL multi-modal models [Experimental]
Workflow supports running in async mode.
ModelWrapper provides openai.AsyncOpenAI interface.

Utils Module

Enhance logger and support printing logs of different actors to different files under the checkpoint dir
Enhance wandb and mlflow monitor

New Algorithms

New Workflows

General Multi-turn Email Search

Others

Support uv
Refactor README and documents
Fix many bugs

What's Changed

Support FSDP2 by @chenyushuo in #204
Update veRL to 0.5.0 by @pan-x-c in #195
Enhance Buffer Data Processing Pipeline by @pan-x-c in #175
Remove AddStrategy by @pan-x-c in #211
Add sft tools example and tests by @garyzhang99 in #210
Cleaning up the old-version data processor by @HYLcool in #213
Update main readme for v0.2.1 by @yanxi-chen in #216
Add data_process figure by @hiyuchang in #218
Enhance Logger by @pan-x-c in #217
Bug fix for experience display in wandb by @chenyushuo in #224
Fix links in readme by @yanxi-chen in #225
Fix alfworld prompt typo by @garyzhang99 in #222
Enhance SFT/DPO reader by @pan-x-c in #226
AsymRE by @yaochaorui in #187
[example] An email search workflow by @hiyuchang in #230
Bug fix for FSDP offload by @chenyushuo in #233
Refactor Storage by @pan-x-c in #227
sPPO by @yaochaorui in #232
Specify attempt in error message in run_with_retry by @vadimkantorov in #241
Add issue template by @pan-x-c in #246
[Example] GRPO on GSM8K with RULER reward by @hiyuchang in #239
Add TOPR and CISPO algorithm by @garyzhang99 in #185
Support Multi turn SFT with tools by @pan-x-c in #245
Support Megatron by @chenyushuo in #219
Support Multi-Modal LLM by @hiyuchang in #234
Add log_table function for mlflow by @hiyuchang in #249
Update the example of human in the loop by @HYLcool in #247
Keep SQL Experience Buffer behavior consistent with previous versions by @pan-x-c in #248
Update MoE training in example by @chenyushuo in #251
SFT tools schema fix by @garyzhang99 in #252
Merge verl-related config into default config by @hiyuchang in #256
Update config manager for 0.3.0 by @chenyushuo in #257

Full Changelog: v0.2.1...v0.3.0

Contributors

vadimkantorov, HYLcool, and 6 other contributors

Assets 2

20 Aug 12:32

pan-x-c

v0.2.1

b0a84b8

v0.2.1

Overview

Agentic RL
1.1 The rollout model can now be accessed directly via the OpenAI API, reducing migration costs.
1.2 Supports general multi-step workflows without requiring concatenated experience data.
1.3 Introduced AddStrategy to facilitate group-based advantage/return calculations (experimental; will be integrated into the buffer module in future versions).
1.4 Added a ReAct Agent RL example based on the AgentScope framework.
1.5 Enhanced the Alfworld example into a general multi-step workflow.
Async / Offline RL
2.1 Refactored RunnerPool to Scheduler, enabling asynchronous scheduling and management of multiple workflow runners.
2.2 Added a priority queue buffer to reduce idling caused by speed differences between Explorer and Trainer through experience sorting and reuse.
2.3 Introduced Synchronizer to manage model weight synchronization between Explorer and Trainer, supporting dynamic synchronization.
2.4 Added tutorials on using the Synchronizer.
Add a benchmark tool for quick verification.
Added support for more RL algorithms (e.g., CHORD， DAPO, GSPO, RAFT).
Updated vllm to 0.10.0 and verl to 0.4.1.
Fixed numerous bugs.

What's Changed

Add a switch for progress bar in _HFBatchReader by @yanxi-chen in #126
Add dapo reward by @hiyuchang in #114
Add readme_zh by @hiyuchang in #127
Fix a typo in readme by @hiyuchang in #128
ModelWrapper automatically record Experience by @pan-x-c in #123
Add continue_from_checkpoint by @hiyuchang in #129
Merge verl v0.4.1 by @hiyuchang in #125
Fix vllm nccl sync error by @pan-x-c in #132
Add more unittest command by @pan-x-c in #133
Add Step-wise Workflow by @pan-x-c in #130
Add workflow and example for toolcall training using ToolAce dataset by @garyzhang99 in #134
Rename data scripts for examples and refine toolcall example readme by @garyzhang99 in #137
Add sft example by @hiyuchang in #138
Fix buffer.total_epochs not working in SFT/DPO by @pan-x-c in #140
Fix priority queue implementation and enhance testing by @pan-x-c in #135
Update some details in tutorial by @hiyuchang in #144
[exmaples] Updated the OPMD config. by @yaochaorui in #145
Rollout openAI API compatible with vllm 0.8.5 by @pan-x-c in #146
Standardize Experience and Sample Strategy by @pan-x-c in #141
Add fused_kernel_options by @chenyushuo in #150
Fix MATH readme by @hiyuchang in #151
Calculate advantage in Explorer by @pan-x-c in #148
Add Synchronizer by @chenyushuo in #131
Add run_id for single-turn workflows by @hiyuchang in #152
Bug fix for Scheduler and torch.tensor by @chenyushuo in #156
Add Step-wise GRPO Advantage by @pan-x-c in #153
Fix a bug in args_pass by @hiyuchang in #155
Add decoupled evaluation workflow by @lingzhq in #142
Add some training tricks for RLVR by @hiyuchang in #147
GSPO-token policy loss function by @nkkarpov in #154
Add tool call usage from our vllm model by @garyzhang99 in #161
Refactor Trainer.train to async function by @chenyushuo in #164
Distinguish repeatable/non-repeatable workflows by @hiyuchang in #162
Add auto release for synchronizer by @chenyushuo in #166
Fix multi-turn logprobs by @pan-x-c in #170
Bug fix in Synchronizer by @chenyushuo in #171
Update vLLM to 0.10.0 and add max_model_len by @hiyuchang in #172
Add agentscope react multi-turn toolcalls example by @garyzhang99 in #165
Add step-wise workflow test by @hiyuchang in #173
Add MLFlow monitor by @pan-x-c in #179
[Feat] Allow user to set train_batch_size by @hiyuchang in #177
[example] Alfworld with General Multi-Step Workflow by @hiyuchang in #169
feat: add RAFT alfworld example with reflection support by @shiweijiezero in #174
Add general multi-step figure by @hiyuchang in #186
Add benchmark by @chenyushuo in #178
Fix custom_fields in experiences by @pan-x-c in #191
Add Document for Synchronizer by @chenyushuo in #190
Bug fix in load_plugins and explorer by @chenyushuo in #193
Add CHORD algorithm example by @garyzhang99 in #194
Fix problem in math_mix config by @garyzhang99 in #196
Fix plugin_loader by @pan-x-c in #201
Add unittest for mix by @hiyuchang in #200
Set truncate_prompt_tokens in SamplingParams, silently truncating very large prompts and preventing vllm from throwing exception by @vadimkantorov in #198
Support multi-version docs by @pan-x-c in #203
Fix/fix agentscope tools example docs by @garyzhang99 in #205
Auto-set pad_token_id when the default is None and not set in the buffer config. by @yaochaorui in #188
Fix agentscope react example readme by @garyzhang99 in #206
Add max_prompt_tokens by @chenyushuo in #202
Release v0.2.1 by @pan-x-c in #208

New Contributors

@yaochaorui made their first contribution in #145
@nkkarpov made their first contribution in #154
@vadimkantorov made their first contribution in #198

Full Changelog: v0.2.0...v0.2.1

Contributors

vadimkantorov, nkkarpov, and 8 other contributors

Assets 2

15 Jul 09:36

pan-x-c

v0.2.0

c8dec22

v0.2.0

Overview

Refactor Algorithm-related modules, see #59 for details
Propose an SFT/GRPO-mixed algorithm
Unify Sync/Async RL via sync_interval, and support one-step async pipeline
Refactor the data processor module, and support processing input tasksets and experience data
Refactor RunnerPool to Scheduler to support automatic fault tolerance and fine-grained scheduling
Refactor Explorer to a fully asynchronous implementation
Support running multiple Explorer instances simultaneously
Update vLLM to v0.9.1, verl to 0.4.0
Support reward functions in RM-Gallery
Fix various bugs
Update the technical report (arXiv v2) with new features, examples, and experiments

What's Changed

Add Policy Loss Functions by @pan-x-c in #62
Refactor advantage computation, and delete RayPPOTrainer.fit by @yanxi-chen in #61
Add unittest && bug fix by @chenyushuo in #65
Add KL/Entorpy Fn by @pan-x-c in #64
Refactor advantage computation (cont.) by @yanxi-chen in #68
Refactor train step by @chenyushuo in #69
Fix BasicEntropyLossFn by @shiweijiezero in #77
Fix Conflicts with main by @pan-x-c in #75
Add Sample Strategy by @pan-x-c in #78
Add doc for SFT by @hiyuchang in #81
merge verl 0.4.0 by @chenyushuo in #79
[Feature] Add MIX algorithm by @hiyuchang in #83
Refactor on select_keys by @chenyushuo in #84
Add guideline for adding new algorithm by @pan-x-c in #85
Update config manager by @chenyushuo in #86
Update docs by @hiyuchang in #89
Refactor state_dict_meta init by @chenyushuo in #90
Unify async/sync RL by @pan-x-c in #91
Support one-step ahead async RL by @pan-x-c in #93
Refactor data module and support task pipeline in data processor by @HYLcool in #92
Bug Fix in namespace by @chenyushuo in #96
Unify ray actor creation method by @pan-x-c in #97
Add ray timeline for profiling by @pan-x-c in #98
Bug fix in explorer.eval by @chenyushuo in #99
Async RL support multiple explorers by @pan-x-c in #100
Refactor explorer loop by @chenyushuo in #101
Record model version during model weight sync by @pan-x-c in #102
Fix bugs in multi-node environments by @pan-x-c in #103
Fix Incomplete Last Batch by @pan-x-c in #104
Support Experience Pipeline by @HYLcool in #105
Bug Fix && Add more logger info by @chenyushuo in #106
Bug fix in alfworld by @chenyushuo in #107
Add timeout to buffer reader by @pan-x-c in #108
Add FAQ in docs by @hiyuchang in #109
Optimize model weight sync process group by @pan-x-c in #112
Add total steps to StorageConfig by @pan-x-c in #111
Remove useless configs by @HYLcool in #113
Add new scheduler with step granularity by @pan-x-c in #110
Split group of tasks to multiple runners by @pan-x-c in #116
Add AsyncPriorityQueue. by @chenyushuo in #115
Refactor RewardFn by @hiyuchang in #118
Add unique_id for each experience by @hiyuchang in #120
Add a script to plot multi-run experiment results by @lingzhq in #122
Update main readme with arxiv v2 by @yanxi-chen in #121
Release Trinity-RFT v0.2.0 by @pan-x-c in #124

New Contributors

@lingzhq made their first contribution in #122

Full Changelog: v0.1.1...v0.2.0

Contributors

HYLcool, chenyushuo, and 5 other contributors

Assets 2

20 Jun 07:46

pan-x-c

v0.1.1

c05296d

v0.1.1

Overview

Supports deployment of non-trained auxiliary models in the cluster, which can be used to provide rewards or other feedback in workflows.
Support more custom components (e.g., monitors), and support automatic loading of custom components
Support the use of file and database buffers in multi-node environments
Bug fix

What's Changed

link to Trinity-Studio codes by @yxdyc in #54
Update developer guide by @pan-x-c in #53
Bug fix in auxiliary_models. by @chenyushuo in #55
Decompose config_manager. by @chenyushuo in #57
Fix constants by @hiyuchang in #63
Support custom monitor by @pan-x-c in #66
Add model_path to auxiliary_models. by @chenyushuo in #67
Fix DLC mode by @pan-x-c in #71
Wrap database in ray actor by @pan-x-c in #70
Add workflow_args for fine-grained control by @pan-x-c in #73
Support loading user-written plugin modules automatically by @pan-x-c in #74
Refactor Buffer Reader by @pan-x-c in #80
Wrap file writer in ray by @pan-x-c in #82
Progress Bar by @shiweijiezero in #87
Customized math workflows by @garyzhang99 in #88
Bumping version to v0.1.1 by @pan-x-c in #94

New Contributors

@yxdyc made their first contribution in #54
@shiweijiezero made their first contribution in #87

Full Changelog: v0.1.0...v0.1.1

Contributors

chenyushuo, pan-x-c, and 4 other contributors

Assets 2

26 May 04:21

pan-x-c

v0.1.0

a9a45d8

v0.1.0

Overview

Trinity-RFT is a general-purpose and unified framework for reinforcement fine-tuning of large language models.

We release Trinity-RFT v0.1.0 with our technique report

What's Changed

Add GitHub Actions by @pan-x-c in #1
fix some bugs for eval by @hiyuchang in #2
Try to activate data module in the launcher by @HYLcool in #3
Install from Dockerfile by @pan-x-c in #7
refactor MultiTurnWorkflow class and add SciWorld ENV by @garyzhang99 in #9
UPD on config_manager.py and docs by @chenyushuo in #4
Add sft warmup before dpo by @hiyuchang in #12
Update main readme by @yanxi-chen in #8
Add examples directory by @hiyuchang in #14
Add unittest on self-host runner by @pan-x-c in #15
A trick to update wandb after each step by @xieyxclack in #16
Add training service by @HYLcool in #13
Add tensorboard monitor by @pan-x-c in #21
Simplify wandb log and update default trainer config by @pan-x-c in #22
fix: init_process_group failed when using ipv6 master address by @0x404 in #24
fix: some hf dataset may need specified config name by @0x404 in #25
Refactor on config_manager.py by @chenyushuo in #23
Make 'get_exp_strategy' effective in sql by @hiyuchang in #26
Add more unittest and support Qwen3 by @pan-x-c in #29
Config Refactor by @chenyushuo in #27
Add example_async_mode by @hiyuchang in #28
Rollout use vLLM V1 engine by @pan-x-c in #31
Fix in Config by @chenyushuo in #30
Add CLI for Trinity Studio by @pan-x-c in #32
Add bench mode by @hiyuchang in #35
Fix n does not take effect in vLLM v1 engine by @pan-x-c in #36
Add a switch for Qwen3 think mode by @pan-x-c in #37
Refactor on TaskSet and Buffer by @chenyushuo in #34
Add utils for PAI DLC by @pan-x-c in #38
Add benchmark mode by @hiyuchang in #39
Refactor workflow by @chenyushuo in #40
Move algorithm_type from trainer to global_config by @chenyushuo in #42
Async model support OpenAI compatible API by @pan-x-c in #41
Add resettable workflow by @chenyushuo in #43
Reorganize config by @pan-x-c in #46
Update grpo and dpo examples by @hiyuchang in #48
Update README.md and main.md by @yanxi-chen in #47
Fix config manager by @chenyushuo in #49
Update trinity config guide by @pan-x-c in #50
Update README.md and main.md by @yanxi-chen in #52

New Contributors

@pan-x-c made their first contribution in #1
@hiyuchang made their first contribution in #2
@HYLcool made their first contribution in #3
@garyzhang99 made their first contribution in #9
@chenyushuo made their first contribution in #4
@yanxi-chen made their first contribution in #8
@xieyxclack made their first contribution in #16
@0x404 made their first contribution in #24

Full Changelog: https://github.com/modelscope/Trinity-RFT/commits/v0.1.0

Contributors

HYLcool, 0x404, and 6 other contributors

Assets 2

Releases: modelscope/Trinity-RFT

v0.3.1

Overview

Agentic RL

Framework Enhancement

Documentation

What's Changed

Contributors

Uh oh!

v0.3.0

Overview

Framework Development

Buffer Module

Trainer Module

Explorer Module

Utils Module

New Algorithms

New Workflows

Others

What's Changed

Contributors

Uh oh!

v0.2.1

Overview

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

Overview

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.1

Overview

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

Overview

What's Changed

New Contributors

Contributors

Uh oh!