v0.3.0

pan-x-c released this 09 Sep 10:14

· 42 commits to main since this release

7d2323f

Overview

Framework Development

Buffer Module

Use Operator Interface to replace the original AddStrategy. Operator can perform various transformations on experience data in a pipeline manner. [Breaking Change]
Add TaskPipeline and ExperiencePipeline for task and experience data preprocessing.
Support calling Data-Juicer services in both TaskPipeline and ExperiencePipeline, and resolve some dependency conflicts.
Refactor SQL/FILE storage. SQL can store SFT/DPO/Rollout/Experience data. SQL and FILE support parsing multi-turn SFT data with tools. [Breaking Change]

Trainer Module

Support FSDP2 backend
Support Megatron backend
Support Qwen2.5 VL muti-modal models [Experimental]

Explorer Module

Support Qwen2.5 VL multi-modal models [Experimental]
Workflow supports running in async mode.
ModelWrapper provides openai.AsyncOpenAI interface.

Utils Module

Enhance logger and support printing logs of different actors to different files under the checkpoint dir
Enhance wandb and mlflow monitor

New Algorithms

New Workflows

General Multi-turn Email Search

Others

Support uv
Refactor README and documents
Fix many bugs

What's Changed

Support FSDP2 by @chenyushuo in #204
Update veRL to 0.5.0 by @pan-x-c in #195
Enhance Buffer Data Processing Pipeline by @pan-x-c in #175
Remove AddStrategy by @pan-x-c in #211
Add sft tools example and tests by @garyzhang99 in #210
Cleaning up the old-version data processor by @HYLcool in #213
Update main readme for v0.2.1 by @yanxi-chen in #216
Add data_process figure by @hiyuchang in #218
Enhance Logger by @pan-x-c in #217
Bug fix for experience display in wandb by @chenyushuo in #224
Fix links in readme by @yanxi-chen in #225
Fix alfworld prompt typo by @garyzhang99 in #222
Enhance SFT/DPO reader by @pan-x-c in #226
AsymRE by @yaochaorui in #187
[example] An email search workflow by @hiyuchang in #230
Bug fix for FSDP offload by @chenyushuo in #233
Refactor Storage by @pan-x-c in #227
sPPO by @yaochaorui in #232
Specify attempt in error message in run_with_retry by @vadimkantorov in #241
Add issue template by @pan-x-c in #246
[Example] GRPO on GSM8K with RULER reward by @hiyuchang in #239
Add TOPR and CISPO algorithm by @garyzhang99 in #185
Support Multi turn SFT with tools by @pan-x-c in #245
Support Megatron by @chenyushuo in #219
Support Multi-Modal LLM by @hiyuchang in #234
Add log_table function for mlflow by @hiyuchang in #249
Update the example of human in the loop by @HYLcool in #247
Keep SQL Experience Buffer behavior consistent with previous versions by @pan-x-c in #248
Update MoE training in example by @chenyushuo in #251
SFT tools schema fix by @garyzhang99 in #252
Merge verl-related config into default config by @hiyuchang in #256
Update config manager for 0.3.0 by @chenyushuo in #257

Full Changelog: v0.2.1...v0.3.0

Contributors

vadimkantorov, HYLcool, and 6 other contributors

Assets 2