v0.2.0
What's Changed
- [Attn] Delete V reduction & Enable 256 headdim tests by @yzhangcs in #273
- [RWKV7] Add more elementwise kernels by @zhiyuan1i in #271
- [CI] Remove cache and disable full test on Arc GPU by @zhiyuan1i in #274
- [Fox] Add model/layer/kernel impls w/ varlen support by @yzhangcs in #275
- [FoX] Simplify some tests and enhance tiling by @zhiyuan1i in #277
- [Test] Remove some warnings and correct condition checks by @zhiyuan1i in #278
- [CI] auto-cancel workflows on PR merge via concurrency group by @zhiyuan1i in #280
- [Test] use
tl.float16instead oftl.bfloat16by @zhiyuan1i in #281 - [OP] replace
tl.exp,tl.log,tl.log2with fast ops whenFLA_USE_FAST_OPS=1by @zhiyuan1i in #276 - [FoX] Rename
foxtoforgetting_attnby @yzhangcs in #282 - [DeltaNet] WY repr speedup by @yzhangcs in #279
- [README] Add
--no-use-pep517flag for faster installation by @zhiyuan1i in #286 - [FoX] Skip test
D>128on RTX4090 by @zhiyuan1i in #287 - [FoX] Test different forget gate initialization ranges by @zhixuan-lin in #291
- [FoX] Fix class inheritance for ForgettingTransformerForCausalLM by @zhixuan-lin in #293
- [CI] use latest stable
tritonby @zhiyuan1i in #294 - [Triton] use
tl.gatherto enhance performance by @zhiyuan1i in #270 - [WY representation] Faster lower triangle inverse by @sustcsonglin in #289
- [GroupNorm] Add argument
is_rms_normto GroupNorm by @zhixuan-lin in #295 - [GroupNorm] Return correct residual in reference implementation by @zhixuan-lin in #297
- [CI] Don't show
Tritonautotune logs in CI by @zhiyuan1i in #298 - [FoX] Use GroupNorm for QK-norm implementation in FoX by @zhixuan-lin in #299
- [Utils] Update H100 and A100 configs by @zhiyuan1i in #306
- Pass shifted labels and add a warning to RWKV-7 initialization. by @Triang-jyed-driung in #304
- [Misc.] Update imports for
GatedDeltaProductby @yzhangcs in #309 - [FAQ] Rewrite the nightly installation instructions by @zhiyuan1i in #305
- Add unit tests for model forward and variable-length checks by @yzhangcs in #310
- [Test] Improve path handling and test file detection by @zhiyuan1i in #311
- [ShortConv] Adjust input shape according to
cu_seqlensby @yzhangcs in #316 - [Tests] Add unit tests for generation with padding by @yzhangcs in #312
- [Testing] Update testing.py by @zhiyuan1i in #320
- [DeltaNet] optimize
chunk_delta_hby @sustcsonglin in #315 - [CI] Only cancel in-progress CI for pull requests by @zhiyuan1i in #321
- [Test] Skip some tests on arcA770 by @zhiyuan1i in #322
- [API] Update
head_firstparameter default toFalseby @yzhangcs in #324 - [Rotary] Remove max_seqlen parameter and adjust related logic by @yzhangcs in #326
- [DeltaProduct] Remove unnecessary config parameter. by @JulienSiems in #325
- fix the training problem of GatedDeltaProduct by @ridgerchu in #327
- [Linear Attn] Fix head_first tests by @yzhangcs in #330
- [Deprecated] Remove
head_firstoption in gla variants by @yzhangcs in #337 - [Test] Ensure most tests on Triton 3.2.0 and add
4096seq_length in tests [skip test] by @zhiyuan1i in #300 - [FoX] Merge code to FlashAttention | support batch inference by @sustcsonglin in #333
- [DeltaNet] Delete
head_firstoption for all by @yzhangcs in #338 - [WIP] Remove head_first option by @yzhangcs in #339
- [RWKV7] add
input_precisionparam [skip test] by @zhiyuan1i in #335 - [Testing] Add recursive dependency finding for test discovery by @zhiyuan1i in #341
- [WIP] Delete
head_firstoption for cumsum by @yzhangcs in #342 - [WIP] Delete head_first tests for DeltaNet/GLA by @yzhangcs in #344
- [Attn] Remove
head_first& renameoffsetstocu_seqlensby @yzhangcs in #345 - [RWKV7] Drop some kernels to enhance speed by @zhiyuan1i in #346
- Remove the
head_firstarg from several token mixing layer fns. by @yzhangcs in #347
New Contributors
- @sustcsonglin made their first contribution in #289
Full Changelog: v0.1.2...v0.2.0