Overview
Agentic RL
- Add more agentic RL examples using agent frameworks (e.g. AgentScope)
- Provide Debug mode for workflow developers
- Add examples for RL in non-verifiable domain: trainable RULER reward, rubric-as-reward
Framework Enhancement
- Support multi-stage training
- Support using environment variables in configuration file
- Support LoRA
- Enhance checkpoint saving process
- Enhance experience replay mechanism for priority queue buffer
- Add algorithms: group-relative REINFORCE variants
- Update vLLM to 0.10.2
Documentation
- Add Chinese Docs
- Rewrite Developer Guide
What's Changed
- Support new task ordering methods by @HYLcool in #265
- Fix problem in get_openr1_dataset.py in chord example by @garyzhang99 in #266
- Replace
/PATH/TO
to${oc.env:TRINITY_XXX}
by @chenyushuo in #267 - Support multi-stage training by @pan-x-c in #268
- [Example] Policy model as its own reward model by @hiyuchang in #270
- Dev/update agentscope react example version by @garyzhang99 in #275
- Normalize Trainer by @pan-x-c in #271
- Refactor workflow to async workflow by @chenyushuo in #276
- Support Chinese Docs by @hiyuchang in #277
- Support vLLM 0.10.2 by @pan-x-c in #278
- AgentScopeV1.0 WebSearch Workflow(simple react + search api) by @garyzhang99 in #264
- Add enable_activation_offload configuration option by @nkkarpov in #281
- Update data-juicer version in toml by @chenyushuo in #286
- Improvement in config by @chenyushuo in #288
- Add
loss-agg-mode
for policy loss by @hiyuchang in #294 - Explorer provides OpenAI API compatible inference service by @pan-x-c in #289
- Fix absmethod in workflow by @chenyushuo in #297
- Add LoRA mode by @hiyuchang in #291
- Enhance support for multi-modal models by @pan-x-c in #298
- Group-relative REINFORCE Families by @yaochaorui in #292
- Optimize Installation and Development Doc by @pan-x-c in #301
- Update Readme by @pan-x-c in #302
- Refactor checkpoint save by @chenyushuo in #299
- Refactor AgentScope ReAct Agent workflow example by @pan-x-c in #303
- Update Workflow Developement Tutorial by @pan-x-c in #310
- Update chord with tooluse example by @garyzhang99 in #313
- Debug Mode for workflow developers by @pan-x-c in #314
- [BUGFIX]Fix tokenizer bug when getting action masks with enable_thinking arguments by @garyzhang99 in #316
- Add
task_count
by @hiyuchang in #307 - Update FAQ by @hiyuchang in #320
- Refactoring
EmailSearchWorkflow
andEmailSearchAgent
to adapt to the latest version ofAgentscope
. by @chenyushuo in #321 - Non-verifiable Medicine QA Task by @hiyuchang in #317
- Add batch level std calculation by @garyzhang99 in #311
- Implement serial saving by @chenyushuo in #322
- Enhance experience replay for priority queue buffer by @yanxi-chen in #306
- Fix example config typo by @garyzhang99 in #323
- Simplify Config by @pan-x-c in #325
- Update config from ray cluster by @hiyuchang in #324
- Support AgentScope Workflow Function by @pan-x-c in #327
Full Changelog: v0.3.0...v0.3.1