Skip to content

v0.2.0

Choose a tag to compare

@pan-x-c pan-x-c released this 15 Jul 09:36
· 136 commits to main since this release
c8dec22

Overview

  1. Refactor Algorithm-related modules, see #59 for details
  2. Propose an SFT/GRPO-mixed algorithm
  3. Unify Sync/Async RL via sync_interval, and support one-step async pipeline
  4. Refactor the data processor module, and support processing input tasksets and experience data
  5. Refactor RunnerPool to Scheduler to support automatic fault tolerance and fine-grained scheduling
  6. Refactor Explorer to a fully asynchronous implementation
  7. Support running multiple Explorer instances simultaneously
  8. Update vLLM to v0.9.1, verl to 0.4.0
  9. Support reward functions in RM-Gallery
  10. Fix various bugs
  11. Update the technical report (arXiv v2) with new features, examples, and experiments

What's Changed

New Contributors

Full Changelog: v0.1.1...v0.2.0