Release v0.2.0 · modelscope/Trinity-RFT

Overview

Refactor Algorithm-related modules, see #59 for details
Propose an SFT/GRPO-mixed algorithm
Unify Sync/Async RL via sync_interval, and support one-step async pipeline
Refactor the data processor module, and support processing input tasksets and experience data
Refactor RunnerPool to Scheduler to support automatic fault tolerance and fine-grained scheduling
Refactor Explorer to a fully asynchronous implementation
Support running multiple Explorer instances simultaneously
Update vLLM to v0.9.1, verl to 0.4.0
Support reward functions in RM-Gallery
Fix various bugs
Update the technical report (arXiv v2) with new features, examples, and experiments

What's Changed

Add Policy Loss Functions by @pan-x-c in #62
Refactor advantage computation, and delete RayPPOTrainer.fit by @yanxi-chen in #61
Add unittest && bug fix by @chenyushuo in #65
Add KL/Entorpy Fn by @pan-x-c in #64
Refactor advantage computation (cont.) by @yanxi-chen in #68
Refactor train step by @chenyushuo in #69
Fix BasicEntropyLossFn by @shiweijiezero in #77
Fix Conflicts with main by @pan-x-c in #75
Add Sample Strategy by @pan-x-c in #78
Add doc for SFT by @hiyuchang in #81
merge verl 0.4.0 by @chenyushuo in #79
[Feature] Add MIX algorithm by @hiyuchang in #83
Refactor on select_keys by @chenyushuo in #84
Add guideline for adding new algorithm by @pan-x-c in #85
Update config manager by @chenyushuo in #86
Update docs by @hiyuchang in #89
Refactor state_dict_meta init by @chenyushuo in #90
Unify async/sync RL by @pan-x-c in #91
Support one-step ahead async RL by @pan-x-c in #93
Refactor data module and support task pipeline in data processor by @HYLcool in #92
Bug Fix in namespace by @chenyushuo in #96
Unify ray actor creation method by @pan-x-c in #97
Add ray timeline for profiling by @pan-x-c in #98
Bug fix in explorer.eval by @chenyushuo in #99
Async RL support multiple explorers by @pan-x-c in #100
Refactor explorer loop by @chenyushuo in #101
Record model version during model weight sync by @pan-x-c in #102
Fix bugs in multi-node environments by @pan-x-c in #103
Fix Incomplete Last Batch by @pan-x-c in #104
Support Experience Pipeline by @HYLcool in #105
Bug Fix && Add more logger info by @chenyushuo in #106
Bug fix in alfworld by @chenyushuo in #107
Add timeout to buffer reader by @pan-x-c in #108
Add FAQ in docs by @hiyuchang in #109
Optimize model weight sync process group by @pan-x-c in #112
Add total steps to StorageConfig by @pan-x-c in #111
Remove useless configs by @HYLcool in #113
Add new scheduler with step granularity by @pan-x-c in #110
Split group of tasks to multiple runners by @pan-x-c in #116
Add AsyncPriorityQueue. by @chenyushuo in #115
Refactor RewardFn by @hiyuchang in #118
Add unique_id for each experience by @hiyuchang in #120
Add a script to plot multi-run experiment results by @lingzhq in #122
Update main readme with arxiv v2 by @yanxi-chen in #121
Release Trinity-RFT v0.2.0 by @pan-x-c in #124

New Contributors

@lingzhq made their first contribution in #122

Full Changelog: v0.1.1...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

What's Changed

New Contributors

Contributors

Uh oh!