Releases: linkedin/Liger-Kernel
Releases · linkedin/Liger-Kernel
v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes
What's Changed
- fix zip bug by @KareemMusleh in #702
- [dpo] set default average_log_prob to False by @cyr0930 in #693
- Rank build status lower by @momochen in #707
- Add support for Qwen3 MoE models by @chiwanpark in #706
- Fix qwen3_moe flaky convergence test by @vaibhavjindal in #710
- Fix empty Medusa head tensors by @chiwanpark in #698
- Sparsemax by @AndreSlavescu in #687
- fix: remove docstring imports in transformer patches by @NanoCode012 in #712
- Increase tests timeout to 45 mins by @vaibhavjindal in #718
- fix modal tests by @shivam15s in #719
- Visualizer Update by @AndreSlavescu in #717
- Sparsemax Documentation by @AndreSlavescu in #716
- element-wise-DyT faster than the origin LigerDyT by @mdy666 in #673
- GRPO Loss kernel fully write by triton, reduce 46G memory by @mdy666 in #672
- Make FLCE compatible with FSDP and PEFT by @astefanutti in #674
- Fix incorrect module patching when using LoRA with modules_to_save by @BenasdTW in #632
- [XPU] Changed how XPU discovery works during
setup.py
by @Egor-Krivov in #720 - Fix to publish docs on pushes to main branch by @shimizust in #722
- Release 0.5.10 by @shimizust in #725
New Contributors
- @KareemMusleh made their first contribution in #702
- @cyr0930 made their first contribution in #693
- @NanoCode012 made their first contribution in #712
- @mdy666 made their first contribution in #673
- @astefanutti made their first contribution in #674
- @Egor-Krivov made their first contribution in #720
Full Changelog: v0.5.9...v0.5.10
v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes
What's Changed
- update setup.py for installation on xpu by @faaany in #668
- update XPU CI yaml file to use docker container by @faaany in #669
- Add average_log_prob as an init param for LigerFusedLinearDPOLoss by @vaibhavjindal in #676
- add shift label change by @shivam15s in #683
- remove tests that can pass on XPU by @faaany in #686
- Update mkdocs.yml by @shivam15s in #691
- Fix LigerCrossEntropy reduction='none' by @Tcc0403 in #680
- Support GLM-4 models by @intervitens in #685
- Import glm4_lce_forward locally in function by @vaibhavjindal in #695
- Qwen3 model support by @vaibhavjindal in #692
- Use logits_to_keep logic for training runs by @vaibhavjindal in #696
- increase gemma3 multimodal convergence test loss atol by @shivam15s in #697
- Update pyproject.toml by @shivam15s in #700
New Contributors
- @intervitens made their first contribution in #685
Full Changelog: v0.5.8...v0.5.9
v0.5.8: Backward-Compatible Fix
What's Changed
- backward compatible initialization by @shivam15s in #666
- Update pyproject.toml by @shivam15s in #667
Full Changelog: v0.5.7...v0.5.8
v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes
What's Changed
- Gemma3 (Text and Multimodal) by @eljandoubi in #621
- Make FLCE compatible with latest
XXXForCausalLM.forward()
APIs by @Tcc0403 in #596 - do bias addition in tests in float32 to make testing code similar to torch compile by @shivam15s in #655
- [CI] fix siglip dummy config by @yundai424 in #658
- add XPU tuning to JSD by @rmukhopa in #649
- add XPU tuning to Rmsnorm and Layernorm by @Tarakarevu1 in #653
- Fix imports without transformers by @vaibhavjindal in #659
- Use TYPE_CHECKING to fix static-only imports in IDEs etc by @vaibhavjindal in #660
- [kl_div] Modified block and warp sizes for improved performance by @jgtong in #654
- [GRPO] add support for different loss types by @kashif in #662
- Remove unexpected kwargs passing to flce by @Tcc0403 in #651
- reduce number of tests for grpo by @shivam15s in #663
- Update pyproject.toml by @shivam15s in #665
New Contributors
- @rmukhopa made their first contribution in #649
- @Tarakarevu1 made their first contribution in #653
- @jgtong made their first contribution in #654
Full Changelog: v0.5.6...v0.5.7
v0.5.6: Enhancements, Fixes, and Expanded Support (Paligemma, DyT, XPU, Llava, GRPO, and More!)
What's Changed
- [JSD] JSD fixes by @kashif in #609
- Paligemma support by @eljandoubi in #608
- Fix hidden size by @eljandoubi in #612
- Add loss_utils for rewriting lce_forward methods by @Tcc0403 in #614
- Update Star History URL by @ryankert01 in #616
- Update README.md by @shivam15s in #617
- language model of paligemma 1 is gemma 1. by @eljandoubi in #613
- Update README to reflect recent changes by @helloworld1 in #619
- Support Dynamic Tanh (DyT) by @Tcc0403 in #618
- Fix incorrect module name when monkey_patch applied to instantiated model by @vaibhavjindal in #629
- [chunked loss] align teacher and student logit shape by @yundai424 in #634
- Fix incorrect condition comment in log_target calculation by @p81sunshine in #633
- Add huggingface llava by @jp1924 in #524
- fix Llava test-bwd failure by @jp1924 in #639
- Fix GRPO to conform with TRL: Fix loss, make tests accurate, correct metrics computation by @shivam15s and @mRSun15 in #628
- add xpu tuning to CE by @mgrabban in #645
- add xpu tuning to FLJSD by @mgrabban in #647
- Change tests to use rocm 6.3 version and tol changes to make liger run on amd by @shivam15s in #646
- Update pyproject.toml by @shivam15s in #648
New Contributors
- @eljandoubi made their first contribution in #608
- @p81sunshine made their first contribution in #633
Full Changelog: v0.5.5...v0.5.6
v0.5.5: Chunk size fixes for JSD; KTO speed fixes; better metrics tests
What's Changed
- Infer correct device for AMD HIP device by @helloworld1 in #587
- add out of bounds check to cross entropy by @shivam15s in #588
- Monkeypatch for Qwen2.5-VL by @BenasdTW in #552
- KTO changes to return aux outputs by @vaibhavjindal in #589
- [KTO] Only return summed metrics by @vaibhavjindal in #591
- increase chunk size for distillation and add bias to jsd by @shivam15s in #590
- [CI] Add ROCm 6.3 CI by @tjtanaa in #506
- Fix KTO speed issue by @vaibhavjindal in #592
- Compare means of aggregated outputs in KTO tests by @vaibhavjindal in #595
- Fix means of logps and rewards by @vaibhavjindal in #597
- Add chunk_size param to chunked losses by @RichhLi in #599
- Fix DPO/ORPO typo in readme by @tyler-romero in #602
- version bump by @shivam15s in #605
New Contributors
Full Changelog: v0.5.4...v0.5.5
v0.5.4: Granite 3.0 & 3.1, OLMo2, GRPO, TVD loss, and minor fixes
What's Changed
- add GitHub CI for Intel GPU by @faaany in #536
- Add Intel GPU CI to README.md by @hebiao064 in #562
- test split to 16, 32 by @jp1924 in #564
- Clean up workaround introduced in PR #564 by @austin362667 in #566
- Update README.md by @momochen in #567
- Grpo loss by @kashif in #553
- Update Readme with ROCM installation instruction by @zcnrex in #570
- fix qwen2vl and mllama test to pass failing tests by @shivam15s in #571
- KTO: Minor fix and documentation update by @vaibhavjindal in #574
- Add TVD Loss Kernel by @saurabhkoshatwar in #324
- Add KTO Benchmark Data into README by @hebiao064 in #575
- Support Granite 3.0 and 3.1 models by @JamesKunstle in #558
- Improve Hugging Face SFT Script by @ParagEkbote in #539
- Add unit tests for shared prefix masked attention with
torch.FlexAttention
by @austin362667 in #504 - update project readme to include Granite support by @JamesKunstle in #576
- Revert "Improve Hugging Face SFT Script (#539)" and Fix TVD Test for Intel #580 by @shivam15s in #578
- Fix Rope Test by @hebiao064 in #577
- Fix layer norm kernels by @lancerts in #582
- Add OLMO2 model support by @yundai424 in #581
- bump version to 0.5.4 by @yundai424 in #585
New Contributors
- @jp1924 made their first contribution in #564
- @zcnrex made their first contribution in #570
- @vaibhavjindal made their first contribution in #574
- @saurabhkoshatwar made their first contribution in #324
- @JamesKunstle made their first contribution in #558
Full Changelog: v0.5.3...v0.5.4
v0.5.3: Minor fixes for post-training losses and support for KTO Loss
What's Changed
- Add ref_input parameter to support separate inputs for reference model by @xingyaoww in #467
- Revert "Add ref_input parameter to support separate inputs for reference model" by @ByronHsu in #469
- Add dynamic dependency management for CUDA and ROCm by @hebiao064 in #460
- [CI] runtime pip install using uv by @ByronHsu in #471
- modify ref_input in chunked_loss base class and fix tests by @shivam15s in #470
- Add more post training in readme by @ByronHsu in #472
- align post training loss at the center by @ByronHsu in #473
- [Transformer] fix ORPO loss for MOE models by @kashif in #479
- fix: correct typos in docstrings by @shivam15s in #482
- fix chosen_nll_loss in chunked losses by @kashif in #486
- Revert "fix chosen_nll_loss in chunked losses (#486)" by @shivam15s in #489
- fix dpo tests: reduce tolerance and change default compute_nll_loss false by @shivam15s in #490
- CPO & SimPO add label_smoothing by @Mecoli1219 in #493
- Fix Preference Loss and Refactor for Readability by @austin362667 in #484
- annotate tl constexpr values by @winglian in #497
- Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1 by @wizyoung in #477
- Move the checkstyle to Ruff by @shivam15s in #483
- Fix/liger fused linear cross entropy function does not support reduction=none by @ryankert01 in #496
- Fix Dtype Mismatch in torch.addmm within ops/fused_linear_cross_entropy.py in AMP training. by @DandinPower in #502
- Add weight support for LigerCrossEntropy by @Tcc0403 in #420
- Refactor Temperature Scaling in Distillation Loss by @austin362667 in #444
- Fix All
chunked_loss
Benchmark Scripts by @austin362667 in #438 - Set z_loss_1d=None when return_z_loss=False in cross_entropy_loss to avoid tl.store fail when triton_interpret=1(for tl.device_print etc.) by @wa008 in #508
- Add
aux_outputs
for CPO and SimPO by @Mecoli1219 in #492 - Add
average_log_prob
args for cpo by @Mecoli1219 in #510 - Refactor CrossEntropy and FusedLinearCrossEntropy by @Tcc0403 in #511
- [ORPO] add nll_target for orpo nll loss by @kashif in #503
- Format Benchmark Scripts with Ruff by @austin362667 in #516
- [Tiny] Add QVQ to readme by @tyler-romero in #522
- Add argument
return_z_loss
to flce by @Tcc0403 in #530 - Remove extra print by @apaz-cli in #531
- Fix HF
transformers
Breaking Changes by @austin362667 in #526 - Handle cache_position for transformers 4.47.0 and later (#528) by @BenasdTW in #529
- Create Docs for Liger-Kernel by @ParagEkbote in #485
- Add Mkdocs related dependencies to setup.py by @hebiao064 in #534
- Add KTO Loss by @hebiao064 in #475
- [tests] use a valid hexadecimal string instead of a placeholder by @faaany in #535
- [tests] skip failed tests for xpu by @faaany in #498
- Format files by @austin362667 in #541
- Fix Broken Links by @ParagEkbote in #547
- [Fix] Fix the type hint of
test_utils::concatenated_forward
by @hongpeng-guo in #549 - Add JSD Loss for Distillation by @austin362667 in #425
- [DPO] add reference log-prob outputs in DPO by @kashif in #521
- Fix DPO unit test fail and refactor by @Tcc0403 in #554
New Contributors
- @xingyaoww made their first contribution in #467
- @kashif made their first contribution in #479
- @Mecoli1219 made their first contribution in #493
- @winglian made their first contribution in #497
- @DandinPower made their first contribution in #502
- @wa008 made their first contribution in #508
- @apaz-cli made their first contribution in #531
- @BenasdTW made their first contribution in #529
- @ParagEkbote made their first contribution in #485
Full Changelog: v0.5.2...v0.5.3
v0.5.2: Fix Qwen2VL mrope for transformer>=4.47
What's Changed
- Disable Qwen2 VL test for with logits conv test by @ByronHsu in #463
- Fix Qwen2VL mrope for transformers 4.47.0 by @li-plus in #464
- Revert Workaround of Disabling QWEN2_VL in Convergence Tests by @austin362667 in #466
Full Changelog: v0.5.1...v0.5.2
v0.5.1: Patch Fix Import Error
What's Changed
Full Changelog: v0.5.0...v0.5.1