Releases: modelscope/Trinity-RFT
v0.3.1
Overview
Agentic RL
- Add more agentic RL examples using agent frameworks (e.g. AgentScope)
- Provide Debug mode for workflow developers
- Add examples for RL in non-verifiable domain: trainable RULER reward, rubric-as-reward
Framework Enhancement
- Support multi-stage training
- Support using environment variables in configuration file
- Support LoRA
- Enhance checkpoint saving process
- Enhance experience replay mechanism for priority queue buffer
- Add algorithms: group-relative REINFORCE variants
- Update vLLM to 0.10.2
Documentation
- Add Chinese Docs
- Rewrite Developer Guide
What's Changed
- Support new task ordering methods by @HYLcool in #265
- Fix problem in get_openr1_dataset.py in chord example by @garyzhang99 in #266
- Replace
/PATH/TO
to${oc.env:TRINITY_XXX}
by @chenyushuo in #267 - Support multi-stage training by @pan-x-c in #268
- [Example] Policy model as its own reward model by @hiyuchang in #270
- Dev/update agentscope react example version by @garyzhang99 in #275
- Normalize Trainer by @pan-x-c in #271
- Refactor workflow to async workflow by @chenyushuo in #276
- Support Chinese Docs by @hiyuchang in #277
- Support vLLM 0.10.2 by @pan-x-c in #278
- AgentScopeV1.0 WebSearch Workflow(simple react + search api) by @garyzhang99 in #264
- Add enable_activation_offload configuration option by @nkkarpov in #281
- Update data-juicer version in toml by @chenyushuo in #286
- Improvement in config by @chenyushuo in #288
- Add
loss-agg-mode
for policy loss by @hiyuchang in #294 - Explorer provides OpenAI API compatible inference service by @pan-x-c in #289
- Fix absmethod in workflow by @chenyushuo in #297
- Add LoRA mode by @hiyuchang in #291
- Enhance support for multi-modal models by @pan-x-c in #298
- Group-relative REINFORCE Families by @yaochaorui in #292
- Optimize Installation and Development Doc by @pan-x-c in #301
- Update Readme by @pan-x-c in #302
- Refactor checkpoint save by @chenyushuo in #299
- Refactor AgentScope ReAct Agent workflow example by @pan-x-c in #303
- Update Workflow Developement Tutorial by @pan-x-c in #310
- Update chord with tooluse example by @garyzhang99 in #313
- Debug Mode for workflow developers by @pan-x-c in #314
- [BUGFIX]Fix tokenizer bug when getting action masks with enable_thinking arguments by @garyzhang99 in #316
- Add
task_count
by @hiyuchang in #307 - Update FAQ by @hiyuchang in #320
- Refactoring
EmailSearchWorkflow
andEmailSearchAgent
to adapt to the latest version ofAgentscope
. by @chenyushuo in #321 - Non-verifiable Medicine QA Task by @hiyuchang in #317
- Add batch level std calculation by @garyzhang99 in #311
- Implement serial saving by @chenyushuo in #322
- Enhance experience replay for priority queue buffer by @yanxi-chen in #306
- Fix example config typo by @garyzhang99 in #323
- Simplify Config by @pan-x-c in #325
- Update config from ray cluster by @hiyuchang in #324
- Support AgentScope Workflow Function by @pan-x-c in #327
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Overview
Framework Development
Buffer Module
- Use
Operator
Interface to replace the originalAddStrategy
.Operator
can perform various transformations on experience data in a pipeline manner. [Breaking Change] - Add
TaskPipeline
andExperiencePipeline
for task and experience data preprocessing. - Support calling Data-Juicer services in both
TaskPipeline
andExperiencePipeline
, and resolve some dependency conflicts. - Refactor
SQL
/FILE
storage.SQL
can store SFT/DPO/Rollout/Experience data.SQL
andFILE
support parsing multi-turn SFT data with tools. [Breaking Change]
Trainer Module
- Support FSDP2 backend
- Support Megatron backend
- Support Qwen2.5 VL muti-modal models [Experimental]
Explorer Module
- Support Qwen2.5 VL multi-modal models [Experimental]
Workflow
supports running in async mode.ModelWrapper
providesopenai.AsyncOpenAI
interface.
Utils Module
- Enhance logger and support printing logs of different actors to different files under the checkpoint dir
- Enhance wandb and mlflow monitor
New Algorithms
New Workflows
- General Multi-turn Email Search
Others
- Support
uv
- Refactor README and documents
- Fix many bugs
What's Changed
- Support FSDP2 by @chenyushuo in #204
- Update veRL to 0.5.0 by @pan-x-c in #195
- Enhance Buffer Data Processing Pipeline by @pan-x-c in #175
- Remove
AddStrategy
by @pan-x-c in #211 - Add sft tools example and tests by @garyzhang99 in #210
- Cleaning up the old-version data processor by @HYLcool in #213
- Update main readme for v0.2.1 by @yanxi-chen in #216
- Add data_process figure by @hiyuchang in #218
- Enhance Logger by @pan-x-c in #217
- Bug fix for experience display in wandb by @chenyushuo in #224
- Fix links in readme by @yanxi-chen in #225
- Fix alfworld prompt typo by @garyzhang99 in #222
- Enhance SFT/DPO reader by @pan-x-c in #226
- AsymRE by @yaochaorui in #187
- [example] An email search workflow by @hiyuchang in #230
- Bug fix for FSDP offload by @chenyushuo in #233
- Refactor Storage by @pan-x-c in #227
- sPPO by @yaochaorui in #232
- Specify attempt in error message in run_with_retry by @vadimkantorov in #241
- Add issue template by @pan-x-c in #246
- [Example] GRPO on GSM8K with RULER reward by @hiyuchang in #239
- Add TOPR and CISPO algorithm by @garyzhang99 in #185
- Support Multi turn SFT with tools by @pan-x-c in #245
- Support Megatron by @chenyushuo in #219
- Support Multi-Modal LLM by @hiyuchang in #234
- Add
log_table
function for mlflow by @hiyuchang in #249 - Update the example of human in the loop by @HYLcool in #247
- Keep SQL Experience Buffer behavior consistent with previous versions by @pan-x-c in #248
- Update MoE training in example by @chenyushuo in #251
- SFT tools schema fix by @garyzhang99 in #252
- Merge verl-related config into default config by @hiyuchang in #256
- Update config manager for 0.3.0 by @chenyushuo in #257
Full Changelog: v0.2.1...v0.3.0
v0.2.1
Overview
-
Agentic RL
1.1 The rollout model can now be accessed directly via the OpenAI API, reducing migration costs.
1.2 Supports general multi-step workflows without requiring concatenated experience data.
1.3 IntroducedAddStrategy
to facilitate group-based advantage/return calculations (experimental; will be integrated into the buffer module in future versions).
1.4 Added a ReAct Agent RL example based on the AgentScope framework.
1.5 Enhanced the Alfworld example into a general multi-step workflow. -
Async / Offline RL
2.1 RefactoredRunnerPool
toScheduler
, enabling asynchronous scheduling and management of multiple workflow runners.
2.2 Added a priority queue buffer to reduce idling caused by speed differences betweenExplorer
andTrainer
through experience sorting and reuse.
2.3 IntroducedSynchronizer
to manage model weight synchronization betweenExplorer
andTrainer
, supporting dynamic synchronization.
2.4 Added tutorials on using the Synchronizer. -
Add a benchmark tool for quick verification.
-
Added support for more RL algorithms (e.g., CHORD, DAPO, GSPO, RAFT).
-
Updated vllm to
0.10.0
and verl to0.4.1
. -
Fixed numerous bugs.
What's Changed
- Add a switch for progress bar in _HFBatchReader by @yanxi-chen in #126
- Add dapo reward by @hiyuchang in #114
- Add readme_zh by @hiyuchang in #127
- Fix a typo in readme by @hiyuchang in #128
- ModelWrapper automatically record Experience by @pan-x-c in #123
- Add continue_from_checkpoint by @hiyuchang in #129
- Merge verl v0.4.1 by @hiyuchang in #125
- Fix vllm nccl sync error by @pan-x-c in #132
- Add more unittest command by @pan-x-c in #133
- Add Step-wise Workflow by @pan-x-c in #130
- Add workflow and example for toolcall training using ToolAce dataset by @garyzhang99 in #134
- Rename data scripts for examples and refine toolcall example readme by @garyzhang99 in #137
- Add sft example by @hiyuchang in #138
- Fix
buffer.total_epochs
not working in SFT/DPO by @pan-x-c in #140 - Fix priority queue implementation and enhance testing by @pan-x-c in #135
- Update some details in tutorial by @hiyuchang in #144
- [exmaples] Updated the OPMD config. by @yaochaorui in #145
- Rollout openAI API compatible with vllm 0.8.5 by @pan-x-c in #146
- Standardize Experience and Sample Strategy by @pan-x-c in #141
- Add
fused_kernel_options
by @chenyushuo in #150 - Fix MATH readme by @hiyuchang in #151
- Calculate advantage in Explorer by @pan-x-c in #148
- Add
Synchronizer
by @chenyushuo in #131 - Add run_id for single-turn workflows by @hiyuchang in #152
- Bug fix for
Scheduler
andtorch.tensor
by @chenyushuo in #156 - Add Step-wise GRPO Advantage by @pan-x-c in #153
- Fix a bug in args_pass by @hiyuchang in #155
- Add decoupled evaluation workflow by @lingzhq in #142
- Add some training tricks for RLVR by @hiyuchang in #147
- GSPO-token policy loss function by @nkkarpov in #154
- Add tool call usage from our vllm model by @garyzhang99 in #161
- Refactor
Trainer.train
to async function by @chenyushuo in #164 - Distinguish repeatable/non-repeatable workflows by @hiyuchang in #162
- Add auto release for
synchronizer
by @chenyushuo in #166 - Fix multi-turn logprobs by @pan-x-c in #170
- Bug fix in Synchronizer by @chenyushuo in #171
- Update vLLM to 0.10.0 and add
max_model_len
by @hiyuchang in #172 - Add agentscope react multi-turn toolcalls example by @garyzhang99 in #165
- Add step-wise workflow test by @hiyuchang in #173
- Add MLFlow monitor by @pan-x-c in #179
- [Feat] Allow user to set
train_batch_size
by @hiyuchang in #177 - [example] Alfworld with General Multi-Step Workflow by @hiyuchang in #169
- feat: add RAFT alfworld example with reflection support by @shiweijiezero in #174
- Add general multi-step figure by @hiyuchang in #186
- Add benchmark by @chenyushuo in #178
- Fix
custom_fields
in experiences by @pan-x-c in #191 - Add Document for Synchronizer by @chenyushuo in #190
- Bug fix in load_plugins and explorer by @chenyushuo in #193
- Add CHORD algorithm example by @garyzhang99 in #194
- Fix problem in math_mix config by @garyzhang99 in #196
- Fix plugin_loader by @pan-x-c in #201
- Add unittest for mix by @hiyuchang in #200
- Set truncate_prompt_tokens in SamplingParams, silently truncating very large prompts and preventing vllm from throwing exception by @vadimkantorov in #198
- Support multi-version docs by @pan-x-c in #203
- Fix/fix agentscope tools example docs by @garyzhang99 in #205
- Auto-set pad_token_id when the default is None and not set in the buffer config. by @yaochaorui in #188
- Fix agentscope react example readme by @garyzhang99 in #206
- Add
max_prompt_tokens
by @chenyushuo in #202 - Release v0.2.1 by @pan-x-c in #208
New Contributors
- @yaochaorui made their first contribution in #145
- @nkkarpov made their first contribution in #154
- @vadimkantorov made their first contribution in #198
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Overview
- Refactor Algorithm-related modules, see #59 for details
- Propose an SFT/GRPO-mixed algorithm
- Unify Sync/Async RL via
sync_interval
, and support one-step async pipeline - Refactor the data processor module, and support processing input tasksets and experience data
- Refactor
RunnerPool
toScheduler
to support automatic fault tolerance and fine-grained scheduling - Refactor
Explorer
to a fully asynchronous implementation - Support running multiple Explorer instances simultaneously
- Update vLLM to v0.9.1, verl to 0.4.0
- Support reward functions in RM-Gallery
- Fix various bugs
- Update the technical report (arXiv v2) with new features, examples, and experiments
What's Changed
- Add Policy Loss Functions by @pan-x-c in #62
- Refactor advantage computation, and delete RayPPOTrainer.fit by @yanxi-chen in #61
- Add unittest && bug fix by @chenyushuo in #65
- Add KL/Entorpy Fn by @pan-x-c in #64
- Refactor advantage computation (cont.) by @yanxi-chen in #68
- Refactor train step by @chenyushuo in #69
- Fix BasicEntropyLossFn by @shiweijiezero in #77
- Fix Conflicts with main by @pan-x-c in #75
- Add Sample Strategy by @pan-x-c in #78
- Add doc for SFT by @hiyuchang in #81
- merge verl 0.4.0 by @chenyushuo in #79
- [Feature] Add MIX algorithm by @hiyuchang in #83
- Refactor on
select_keys
by @chenyushuo in #84 - Add guideline for adding new algorithm by @pan-x-c in #85
- Update config manager by @chenyushuo in #86
- Update docs by @hiyuchang in #89
- Refactor
state_dict_meta
init by @chenyushuo in #90 - Unify async/sync RL by @pan-x-c in #91
- Support one-step ahead async RL by @pan-x-c in #93
- Refactor data module and support task pipeline in data processor by @HYLcool in #92
- Bug Fix in
namespace
by @chenyushuo in #96 - Unify ray actor creation method by @pan-x-c in #97
- Add ray timeline for profiling by @pan-x-c in #98
- Bug fix in
explorer.eval
by @chenyushuo in #99 - Async RL support multiple explorers by @pan-x-c in #100
- Refactor explorer loop by @chenyushuo in #101
- Record model version during model weight sync by @pan-x-c in #102
- Fix bugs in multi-node environments by @pan-x-c in #103
- Fix Incomplete Last Batch by @pan-x-c in #104
- Support Experience Pipeline by @HYLcool in #105
- Bug Fix && Add more logger info by @chenyushuo in #106
- Bug fix in alfworld by @chenyushuo in #107
- Add timeout to buffer reader by @pan-x-c in #108
- Add FAQ in docs by @hiyuchang in #109
- Optimize model weight sync process group by @pan-x-c in #112
- Add total steps to StorageConfig by @pan-x-c in #111
- Remove useless configs by @HYLcool in #113
- Add new scheduler with step granularity by @pan-x-c in #110
- Split group of tasks to multiple runners by @pan-x-c in #116
- Add
AsyncPriorityQueue
. by @chenyushuo in #115 - Refactor RewardFn by @hiyuchang in #118
- Add
unique_id
for each experience by @hiyuchang in #120 - Add a script to plot multi-run experiment results by @lingzhq in #122
- Update main readme with arxiv v2 by @yanxi-chen in #121
- Release Trinity-RFT v0.2.0 by @pan-x-c in #124
New Contributors
Full Changelog: v0.1.1...v0.2.0
v0.1.1
Overview
- Supports deployment of non-trained auxiliary models in the cluster, which can be used to provide rewards or other feedback in workflows.
- Support more custom components (e.g., monitors), and support automatic loading of custom components
- Support the use of file and database buffers in multi-node environments
- Bug fix
What's Changed
- link to Trinity-Studio codes by @yxdyc in #54
- Update developer guide by @pan-x-c in #53
- Bug fix in auxiliary_models. by @chenyushuo in #55
- Decompose
config_manager
. by @chenyushuo in #57 - Fix constants by @hiyuchang in #63
- Support custom monitor by @pan-x-c in #66
- Add model_path to auxiliary_models. by @chenyushuo in #67
- Fix DLC mode by @pan-x-c in #71
- Wrap database in ray actor by @pan-x-c in #70
- Add
workflow_args
for fine-grained control by @pan-x-c in #73 - Support loading user-written plugin modules automatically by @pan-x-c in #74
- Refactor Buffer Reader by @pan-x-c in #80
- Wrap file writer in ray by @pan-x-c in #82
- Progress Bar by @shiweijiezero in #87
- Customized math workflows by @garyzhang99 in #88
- Bumping version to v0.1.1 by @pan-x-c in #94
New Contributors
- @yxdyc made their first contribution in #54
- @shiweijiezero made their first contribution in #87
Full Changelog: v0.1.0...v0.1.1
v0.1.0
Overview
Trinity-RFT is a general-purpose and unified framework for reinforcement fine-tuning of large language models.
We release Trinity-RFT v0.1.0 with our technique report
What's Changed
- Add GitHub Actions by @pan-x-c in #1
- fix some bugs for eval by @hiyuchang in #2
- Try to activate data module in the launcher by @HYLcool in #3
- Install from Dockerfile by @pan-x-c in #7
- refactor MultiTurnWorkflow class and add SciWorld ENV by @garyzhang99 in #9
- UPD on
config_manager.py
and docs by @chenyushuo in #4 - Add sft warmup before dpo by @hiyuchang in #12
- Update main readme by @yanxi-chen in #8
- Add examples directory by @hiyuchang in #14
- Add unittest on self-host runner by @pan-x-c in #15
- A trick to update wandb after each step by @xieyxclack in #16
- Add training service by @HYLcool in #13
- Add tensorboard monitor by @pan-x-c in #21
- Simplify wandb log and update default trainer config by @pan-x-c in #22
- fix: init_process_group failed when using ipv6 master address by @0x404 in #24
- fix: some hf dataset may need specified config name by @0x404 in #25
- Refactor on
config_manager.py
by @chenyushuo in #23 - Make 'get_exp_strategy' effective in sql by @hiyuchang in #26
- Add more unittest and support Qwen3 by @pan-x-c in #29
- Config Refactor by @chenyushuo in #27
- Add example_async_mode by @hiyuchang in #28
- Rollout use vLLM V1 engine by @pan-x-c in #31
- Fix in Config by @chenyushuo in #30
- Add CLI for Trinity Studio by @pan-x-c in #32
- Add bench mode by @hiyuchang in #35
- Fix
n
does not take effect in vLLM v1 engine by @pan-x-c in #36 - Add a switch for Qwen3 think mode by @pan-x-c in #37
- Refactor on
TaskSet
andBuffer
by @chenyushuo in #34 - Add utils for PAI DLC by @pan-x-c in #38
- Add benchmark mode by @hiyuchang in #39
- Refactor
workflow
by @chenyushuo in #40 - Move
algorithm_type
fromtrainer
toglobal_config
by @chenyushuo in #42 - Async model support OpenAI compatible API by @pan-x-c in #41
- Add resettable workflow by @chenyushuo in #43
- Reorganize config by @pan-x-c in #46
- Update grpo and dpo examples by @hiyuchang in #48
- Update README.md and main.md by @yanxi-chen in #47
- Fix config manager by @chenyushuo in #49
- Update trinity config guide by @pan-x-c in #50
- Update README.md and main.md by @yanxi-chen in #52
New Contributors
- @pan-x-c made their first contribution in #1
- @hiyuchang made their first contribution in #2
- @HYLcool made their first contribution in #3
- @garyzhang99 made their first contribution in #9
- @chenyushuo made their first contribution in #4
- @yanxi-chen made their first contribution in #8
- @xieyxclack made their first contribution in #16
- @0x404 made their first contribution in #24
Full Changelog: https://github.com/modelscope/Trinity-RFT/commits/v0.1.0