What's Changed
- Bump v0.0.6.dev0 by @sfc-gh-mwyatt in #213
- add a link to a new blog post by @sfc-gh-sbekman in #215
- Update CODEOWNERS by @sfc-gh-sbekman in #216
- Updated paper and project list by @sfc-gh-jrasley in #218
- Update SwiftKV Llama and Qwen to support transformers 4.53 by @sfc-gh-jrasley in #221
- [ALST] sync with transformers>=4.53 masking utils changes by @sfc-gh-sbekman in #223
- Artifact download script by @sfc-gh-jrasley in #226
- Fix package files not being included on install by @sfc-gh-mwyatt in #227
- switch to raw yaml loading for artifact download script by @sfc-gh-jrasley in #228
- Fix for custom user script relative import by @sfc-gh-mwyatt in #222
- Refactor for supported SFT datasets by @sfc-gh-mwyatt in #220
- add FA3 support by @sfc-gh-sbekman in #232
- Add evaluation method by @sfc-gh-mwyatt in #186
- Fix for long error stack traces by @sfc-gh-mwyatt in #233
- Better datasets map and filter performance by @sfc-gh-mwyatt in #234
- Refactor dataloader creation for improved maintainability (aka. don't forget about persistent_workers) by @sfc-gh-prenc in #235
- CONTRIBUTING.md: add first-good-issue link by @sfc-gh-sbekman in #236
- FA3 support: fix tflops by @sfc-gh-sbekman in #238
- report the correct device in pynvml by @sfc-gh-sbekman in #239
- Extract eval log iter condition by @sfc-gh-prenc in #237
- set max_length to max_position_embeddings by @therealnaveenkamal in #240
- Fix: Make isort see
wandbas third party by @sfc-gh-mwyatt in #243 - ALST: add 1x H200 recipes by @sfc-gh-sbekman in #245
- [ALST] override attn mask for sdpa by @sfc-gh-sbekman in #242
- Debug/Dev Feature: Repeat small datasets to
max_lengthby @sfc-gh-mwyatt in #241 - typo by @sfc-gh-sbekman in #247
- ALST: FA3 and new Liger-kernel for int64 support by @sfc-gh-sbekman in #249
- Switch swiftkv llama-70b base model from 3.1 to 3.3 by @sfc-gh-jrasley in #229
- Update arctic-txt2sql README.md by @sfc-gh-bzhai in #225
- integrate TiledFusedLogitsLoss by @sfc-gh-sbekman in #244
- Update SwiftKV sequence parallel with updated TiledFusedLogitsLoss by @sfc-gh-aqiao in #248
- [swiftkv] transformers 4.54 has deepseek_v2 now by @sfc-gh-jrasley in #251
- checkpoint resume support by @sfc-gh-jrasley in #252
- new deepspeed release by @sfc-gh-sbekman in #253
- Bug fix: reordering of SFT dataset chats by @sfc-gh-mwyatt in #264
- [eval] replace
torch.inference_modewithtorch.no_gradby @sfc-gh-sbekman in #265 - [SP] make eval work by @sfc-gh-sbekman in #259
- bump v0.6.0 by @sfc-gh-mwyatt in #267
New Contributors
- @sfc-gh-prenc made their first contribution in #235
- @therealnaveenkamal made their first contribution in #240
Full Changelog: v0.0.5...v0.6.0