v0.3.0
Overview
Framework Development
Buffer Module
- Use
Operator
Interface to replace the originalAddStrategy
.Operator
can perform various transformations on experience data in a pipeline manner. [Breaking Change] - Add
TaskPipeline
andExperiencePipeline
for task and experience data preprocessing. - Support calling Data-Juicer services in both
TaskPipeline
andExperiencePipeline
, and resolve some dependency conflicts. - Refactor
SQL
/FILE
storage.SQL
can store SFT/DPO/Rollout/Experience data.SQL
andFILE
support parsing multi-turn SFT data with tools. [Breaking Change]
Trainer Module
- Support FSDP2 backend
- Support Megatron backend
- Support Qwen2.5 VL muti-modal models [Experimental]
Explorer Module
- Support Qwen2.5 VL multi-modal models [Experimental]
Workflow
supports running in async mode.ModelWrapper
providesopenai.AsyncOpenAI
interface.
Utils Module
- Enhance logger and support printing logs of different actors to different files under the checkpoint dir
- Enhance wandb and mlflow monitor
New Algorithms
New Workflows
- General Multi-turn Email Search
Others
- Support
uv
- Refactor README and documents
- Fix many bugs
What's Changed
- Support FSDP2 by @chenyushuo in #204
- Update veRL to 0.5.0 by @pan-x-c in #195
- Enhance Buffer Data Processing Pipeline by @pan-x-c in #175
- Remove
AddStrategy
by @pan-x-c in #211 - Add sft tools example and tests by @garyzhang99 in #210
- Cleaning up the old-version data processor by @HYLcool in #213
- Update main readme for v0.2.1 by @yanxi-chen in #216
- Add data_process figure by @hiyuchang in #218
- Enhance Logger by @pan-x-c in #217
- Bug fix for experience display in wandb by @chenyushuo in #224
- Fix links in readme by @yanxi-chen in #225
- Fix alfworld prompt typo by @garyzhang99 in #222
- Enhance SFT/DPO reader by @pan-x-c in #226
- AsymRE by @yaochaorui in #187
- [example] An email search workflow by @hiyuchang in #230
- Bug fix for FSDP offload by @chenyushuo in #233
- Refactor Storage by @pan-x-c in #227
- sPPO by @yaochaorui in #232
- Specify attempt in error message in run_with_retry by @vadimkantorov in #241
- Add issue template by @pan-x-c in #246
- [Example] GRPO on GSM8K with RULER reward by @hiyuchang in #239
- Add TOPR and CISPO algorithm by @garyzhang99 in #185
- Support Multi turn SFT with tools by @pan-x-c in #245
- Support Megatron by @chenyushuo in #219
- Support Multi-Modal LLM by @hiyuchang in #234
- Add
log_table
function for mlflow by @hiyuchang in #249 - Update the example of human in the loop by @HYLcool in #247
- Keep SQL Experience Buffer behavior consistent with previous versions by @pan-x-c in #248
- Update MoE training in example by @chenyushuo in #251
- SFT tools schema fix by @garyzhang99 in #252
- Merge verl-related config into default config by @hiyuchang in #256
- Update config manager for 0.3.0 by @chenyushuo in #257
Full Changelog: v0.2.1...v0.3.0