Skip to content

v0.2.1

Choose a tag to compare

@pan-x-c pan-x-c released this 20 Aug 12:32
· 76 commits to main since this release
b0a84b8

Overview

  1. Agentic RL
    1.1 The rollout model can now be accessed directly via the OpenAI API, reducing migration costs.
    1.2 Supports general multi-step workflows without requiring concatenated experience data.
    1.3 Introduced AddStrategy to facilitate group-based advantage/return calculations (experimental; will be integrated into the buffer module in future versions).
    1.4 Added a ReAct Agent RL example based on the AgentScope framework.
    1.5 Enhanced the Alfworld example into a general multi-step workflow.

  2. Async / Offline RL
    2.1 Refactored RunnerPool to Scheduler, enabling asynchronous scheduling and management of multiple workflow runners.
    2.2 Added a priority queue buffer to reduce idling caused by speed differences between Explorer and Trainer through experience sorting and reuse.
    2.3 Introduced Synchronizer to manage model weight synchronization between Explorer and Trainer, supporting dynamic synchronization.
    2.4 Added tutorials on using the Synchronizer.

  3. Add a benchmark tool for quick verification.

  4. Added support for more RL algorithms (e.g., CHORD, DAPO, GSPO, RAFT).

  5. Updated vllm to 0.10.0 and verl to 0.4.1.

  6. Fixed numerous bugs.

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.1