Skip to content

Releases: modelscope/Trinity-RFT

v0.3.1

17 Oct 07:47
c04b993

Choose a tag to compare

Overview

Agentic RL

  1. Add more agentic RL examples using agent frameworks (e.g. AgentScope)
  2. Provide Debug mode for workflow developers
  3. Add examples for RL in non-verifiable domain: trainable RULER reward, rubric-as-reward

Framework Enhancement

  1. Support multi-stage training
  2. Support using environment variables in configuration file
  3. Support LoRA
  4. Enhance checkpoint saving process
  5. Enhance experience replay mechanism for priority queue buffer
  6. Add algorithms: group-relative REINFORCE variants
  7. Update vLLM to 0.10.2

Documentation

  1. Add Chinese Docs
  2. Rewrite Developer Guide

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0

09 Sep 10:14
7d2323f

Choose a tag to compare

Overview

Framework Development

Buffer Module

  1. Use Operator Interface to replace the original AddStrategy. Operator can perform various transformations on experience data in a pipeline manner. [Breaking Change]
  2. Add TaskPipeline and ExperiencePipeline for task and experience data preprocessing.
  3. Support calling Data-Juicer services in both TaskPipeline and ExperiencePipeline, and resolve some dependency conflicts.
  4. Refactor SQL/FILE storage. SQL can store SFT/DPO/Rollout/Experience data. SQL and FILE support parsing multi-turn SFT data with tools. [Breaking Change]

Trainer Module

  1. Support FSDP2 backend
  2. Support Megatron backend
  3. Support Qwen2.5 VL muti-modal models [Experimental]

Explorer Module

  1. Support Qwen2.5 VL multi-modal models [Experimental]
  2. Workflow supports running in async mode.
  3. ModelWrapper provides openai.AsyncOpenAI interface.

Utils Module

  1. Enhance logger and support printing logs of different actors to different files under the checkpoint dir
  2. Enhance wandb and mlflow monitor

New Algorithms

  1. AsymRE
  2. sPPO
  3. RULER
  4. TOPR and CISPO

New Workflows

  1. General Multi-turn Email Search

Others

  1. Support uv
  2. Refactor README and documents
  3. Fix many bugs

What's Changed

Full Changelog: v0.2.1...v0.3.0

v0.2.1

20 Aug 12:32
b0a84b8

Choose a tag to compare

Overview

  1. Agentic RL
    1.1 The rollout model can now be accessed directly via the OpenAI API, reducing migration costs.
    1.2 Supports general multi-step workflows without requiring concatenated experience data.
    1.3 Introduced AddStrategy to facilitate group-based advantage/return calculations (experimental; will be integrated into the buffer module in future versions).
    1.4 Added a ReAct Agent RL example based on the AgentScope framework.
    1.5 Enhanced the Alfworld example into a general multi-step workflow.

  2. Async / Offline RL
    2.1 Refactored RunnerPool to Scheduler, enabling asynchronous scheduling and management of multiple workflow runners.
    2.2 Added a priority queue buffer to reduce idling caused by speed differences between Explorer and Trainer through experience sorting and reuse.
    2.3 Introduced Synchronizer to manage model weight synchronization between Explorer and Trainer, supporting dynamic synchronization.
    2.4 Added tutorials on using the Synchronizer.

  3. Add a benchmark tool for quick verification.

  4. Added support for more RL algorithms (e.g., CHORD, DAPO, GSPO, RAFT).

  5. Updated vllm to 0.10.0 and verl to 0.4.1.

  6. Fixed numerous bugs.

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.1

v0.2.0

15 Jul 09:36
c8dec22

Choose a tag to compare

Overview

  1. Refactor Algorithm-related modules, see #59 for details
  2. Propose an SFT/GRPO-mixed algorithm
  3. Unify Sync/Async RL via sync_interval, and support one-step async pipeline
  4. Refactor the data processor module, and support processing input tasksets and experience data
  5. Refactor RunnerPool to Scheduler to support automatic fault tolerance and fine-grained scheduling
  6. Refactor Explorer to a fully asynchronous implementation
  7. Support running multiple Explorer instances simultaneously
  8. Update vLLM to v0.9.1, verl to 0.4.0
  9. Support reward functions in RM-Gallery
  10. Fix various bugs
  11. Update the technical report (arXiv v2) with new features, examples, and experiments

What's Changed

New Contributors

Full Changelog: v0.1.1...v0.2.0

v0.1.1

20 Jun 07:46
c05296d

Choose a tag to compare

Overview

  1. Supports deployment of non-trained auxiliary models in the cluster, which can be used to provide rewards or other feedback in workflows.
  2. Support more custom components (e.g., monitors), and support automatic loading of custom components
  3. Support the use of file and database buffers in multi-node environments
  4. Bug fix

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0

26 May 04:21
a9a45d8

Choose a tag to compare

Overview

Trinity-RFT is a general-purpose and unified framework for reinforcement fine-tuning of large language models.

We release Trinity-RFT v0.1.0 with our technique report

What's Changed

New Contributors

Full Changelog: https://github.com/modelscope/Trinity-RFT/commits/v0.1.0