v0.2.0
Overview
- Refactor Algorithm-related modules, see #59 for details
- Propose an SFT/GRPO-mixed algorithm
- Unify Sync/Async RL via
sync_interval
, and support one-step async pipeline - Refactor the data processor module, and support processing input tasksets and experience data
- Refactor
RunnerPool
toScheduler
to support automatic fault tolerance and fine-grained scheduling - Refactor
Explorer
to a fully asynchronous implementation - Support running multiple Explorer instances simultaneously
- Update vLLM to v0.9.1, verl to 0.4.0
- Support reward functions in RM-Gallery
- Fix various bugs
- Update the technical report (arXiv v2) with new features, examples, and experiments
What's Changed
- Add Policy Loss Functions by @pan-x-c in #62
- Refactor advantage computation, and delete RayPPOTrainer.fit by @yanxi-chen in #61
- Add unittest && bug fix by @chenyushuo in #65
- Add KL/Entorpy Fn by @pan-x-c in #64
- Refactor advantage computation (cont.) by @yanxi-chen in #68
- Refactor train step by @chenyushuo in #69
- Fix BasicEntropyLossFn by @shiweijiezero in #77
- Fix Conflicts with main by @pan-x-c in #75
- Add Sample Strategy by @pan-x-c in #78
- Add doc for SFT by @hiyuchang in #81
- merge verl 0.4.0 by @chenyushuo in #79
- [Feature] Add MIX algorithm by @hiyuchang in #83
- Refactor on
select_keys
by @chenyushuo in #84 - Add guideline for adding new algorithm by @pan-x-c in #85
- Update config manager by @chenyushuo in #86
- Update docs by @hiyuchang in #89
- Refactor
state_dict_meta
init by @chenyushuo in #90 - Unify async/sync RL by @pan-x-c in #91
- Support one-step ahead async RL by @pan-x-c in #93
- Refactor data module and support task pipeline in data processor by @HYLcool in #92
- Bug Fix in
namespace
by @chenyushuo in #96 - Unify ray actor creation method by @pan-x-c in #97
- Add ray timeline for profiling by @pan-x-c in #98
- Bug fix in
explorer.eval
by @chenyushuo in #99 - Async RL support multiple explorers by @pan-x-c in #100
- Refactor explorer loop by @chenyushuo in #101
- Record model version during model weight sync by @pan-x-c in #102
- Fix bugs in multi-node environments by @pan-x-c in #103
- Fix Incomplete Last Batch by @pan-x-c in #104
- Support Experience Pipeline by @HYLcool in #105
- Bug Fix && Add more logger info by @chenyushuo in #106
- Bug fix in alfworld by @chenyushuo in #107
- Add timeout to buffer reader by @pan-x-c in #108
- Add FAQ in docs by @hiyuchang in #109
- Optimize model weight sync process group by @pan-x-c in #112
- Add total steps to StorageConfig by @pan-x-c in #111
- Remove useless configs by @HYLcool in #113
- Add new scheduler with step granularity by @pan-x-c in #110
- Split group of tasks to multiple runners by @pan-x-c in #116
- Add
AsyncPriorityQueue
. by @chenyushuo in #115 - Refactor RewardFn by @hiyuchang in #118
- Add
unique_id
for each experience by @hiyuchang in #120 - Add a script to plot multi-run experiment results by @lingzhq in #122
- Update main readme with arxiv v2 by @yanxi-chen in #121
- Release Trinity-RFT v0.2.0 by @pan-x-c in #124
New Contributors
Full Changelog: v0.1.1...v0.2.0