Skip to content

v0.3.1

Latest

Choose a tag to compare

@pan-x-c pan-x-c released this 17 Oct 07:47
· 2 commits to main since this release
c04b993

Overview

Agentic RL

  1. Add more agentic RL examples using agent frameworks (e.g. AgentScope)
  2. Provide Debug mode for workflow developers
  3. Add examples for RL in non-verifiable domain: trainable RULER reward, rubric-as-reward

Framework Enhancement

  1. Support multi-stage training
  2. Support using environment variables in configuration file
  3. Support LoRA
  4. Enhance checkpoint saving process
  5. Enhance experience replay mechanism for priority queue buffer
  6. Add algorithms: group-relative REINFORCE variants
  7. Update vLLM to 0.10.2

Documentation

  1. Add Chinese Docs
  2. Rewrite Developer Guide

What's Changed

Full Changelog: v0.3.0...v0.3.1