Release v0.3.1 · modelscope/Trinity-RFT

Overview

Agentic RL

Add more agentic RL examples using agent frameworks (e.g. AgentScope)
Provide Debug mode for workflow developers
Add examples for RL in non-verifiable domain: trainable RULER reward, rubric-as-reward

Framework Enhancement

Support multi-stage training
Support using environment variables in configuration file
Support LoRA
Enhance checkpoint saving process
Enhance experience replay mechanism for priority queue buffer
Add algorithms: group-relative REINFORCE variants
Update vLLM to 0.10.2

Documentation

Add Chinese Docs
Rewrite Developer Guide

What's Changed

Support new task ordering methods by @HYLcool in #265
Fix problem in get_openr1_dataset.py in chord example by @garyzhang99 in #266
Replace /PATH/TO to ${oc.env:TRINITY_XXX} by @chenyushuo in #267
Support multi-stage training by @pan-x-c in #268
[Example] Policy model as its own reward model by @hiyuchang in #270
Dev/update agentscope react example version by @garyzhang99 in #275
Normalize Trainer by @pan-x-c in #271
Refactor workflow to async workflow by @chenyushuo in #276
Support Chinese Docs by @hiyuchang in #277
Support vLLM 0.10.2 by @pan-x-c in #278
AgentScopeV1.0 WebSearch Workflow(simple react + search api) by @garyzhang99 in #264
Add enable_activation_offload configuration option by @nkkarpov in #281
Update data-juicer version in toml by @chenyushuo in #286
Improvement in config by @chenyushuo in #288
Add loss-agg-mode for policy loss by @hiyuchang in #294
Explorer provides OpenAI API compatible inference service by @pan-x-c in #289
Fix absmethod in workflow by @chenyushuo in #297
Add LoRA mode by @hiyuchang in #291
Enhance support for multi-modal models by @pan-x-c in #298
Group-relative REINFORCE Families by @yaochaorui in #292
Optimize Installation and Development Doc by @pan-x-c in #301
Update Readme by @pan-x-c in #302
Refactor checkpoint save by @chenyushuo in #299
Refactor AgentScope ReAct Agent workflow example by @pan-x-c in #303
Update Workflow Developement Tutorial by @pan-x-c in #310
Update chord with tooluse example by @garyzhang99 in #313
Debug Mode for workflow developers by @pan-x-c in #314
[BUGFIX]Fix tokenizer bug when getting action masks with enable_thinking arguments by @garyzhang99 in #316
Add task_count by @hiyuchang in #307
Update FAQ by @hiyuchang in #320
Refactoring EmailSearchWorkflow and EmailSearchAgent to adapt to the latest version of Agentscope. by @chenyushuo in #321
Non-verifiable Medicine QA Task by @hiyuchang in #317
Add batch level std calculation by @garyzhang99 in #311
Implement serial saving by @chenyushuo in #322
Enhance experience replay for priority queue buffer by @yanxi-chen in #306
Fix example config typo by @garyzhang99 in #323
Simplify Config by @pan-x-c in #325
Update config from ray cluster by @hiyuchang in #324
Support AgentScope Workflow Function by @pan-x-c in #327

Full Changelog: v0.3.0...v0.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

Agentic RL

Framework Enhancement

Documentation

What's Changed

Contributors

Uh oh!