Skip to content

v0.2.0

Choose a tag to compare

@yzhangcs yzhangcs released this 11 Apr 20:31
· 342 commits to main since this release
6bfd5e6

What's Changed

  • [Attn] Delete V reduction & Enable 256 headdim tests by @yzhangcs in #273
  • [RWKV7] Add more elementwise kernels by @zhiyuan1i in #271
  • [CI] Remove cache and disable full test on Arc GPU by @zhiyuan1i in #274
  • [Fox] Add model/layer/kernel impls w/ varlen support by @yzhangcs in #275
  • [FoX] Simplify some tests and enhance tiling by @zhiyuan1i in #277
  • [Test] Remove some warnings and correct condition checks by @zhiyuan1i in #278
  • [CI] auto-cancel workflows on PR merge via concurrency group by @zhiyuan1i in #280
  • [Test] use tl.float16 instead of tl.bfloat16 by @zhiyuan1i in #281
  • [OP] replace tl.exp, tl.log, tl.log2 with fast ops when FLA_USE_FAST_OPS=1 by @zhiyuan1i in #276
  • [FoX] Rename fox to forgetting_attn by @yzhangcs in #282
  • [DeltaNet] WY repr speedup by @yzhangcs in #279
  • [README] Add --no-use-pep517 flag for faster installation by @zhiyuan1i in #286
  • [FoX] Skip test D>128 on RTX4090 by @zhiyuan1i in #287
  • [FoX] Test different forget gate initialization ranges by @zhixuan-lin in #291
  • [FoX] Fix class inheritance for ForgettingTransformerForCausalLM by @zhixuan-lin in #293
  • [CI] use latest stable triton by @zhiyuan1i in #294
  • [Triton] use tl.gather to enhance performance by @zhiyuan1i in #270
  • [WY representation] Faster lower triangle inverse by @sustcsonglin in #289
  • [GroupNorm] Add argument is_rms_norm to GroupNorm by @zhixuan-lin in #295
  • [GroupNorm] Return correct residual in reference implementation by @zhixuan-lin in #297
  • [CI] Don't show Triton autotune logs in CI by @zhiyuan1i in #298
  • [FoX] Use GroupNorm for QK-norm implementation in FoX by @zhixuan-lin in #299
  • [Utils] Update H100 and A100 configs by @zhiyuan1i in #306
  • Pass shifted labels and add a warning to RWKV-7 initialization. by @Triang-jyed-driung in #304
  • [Misc.] Update imports for GatedDeltaProduct by @yzhangcs in #309
  • [FAQ] Rewrite the nightly installation instructions by @zhiyuan1i in #305
  • Add unit tests for model forward and variable-length checks by @yzhangcs in #310
  • [Test] Improve path handling and test file detection by @zhiyuan1i in #311
  • [ShortConv] Adjust input shape according to cu_seqlens by @yzhangcs in #316
  • [Tests] Add unit tests for generation with padding by @yzhangcs in #312
  • [Testing] Update testing.py by @zhiyuan1i in #320
  • [DeltaNet] optimize chunk_delta_h by @sustcsonglin in #315
  • [CI] Only cancel in-progress CI for pull requests by @zhiyuan1i in #321
  • [Test] Skip some tests on arcA770 by @zhiyuan1i in #322
  • [API] Update head_first parameter default to False by @yzhangcs in #324
  • [Rotary] Remove max_seqlen parameter and adjust related logic by @yzhangcs in #326
  • [DeltaProduct] Remove unnecessary config parameter. by @JulienSiems in #325
  • fix the training problem of GatedDeltaProduct by @ridgerchu in #327
  • [Linear Attn] Fix head_first tests by @yzhangcs in #330
  • [Deprecated] Remove head_first option in gla variants by @yzhangcs in #337
  • [Test] Ensure most tests on Triton 3.2.0 and add 4096 seq_length in tests [skip test] by @zhiyuan1i in #300
  • [FoX] Merge code to FlashAttention | support batch inference by @sustcsonglin in #333
  • [DeltaNet] Delete head_first option for all by @yzhangcs in #338
  • [WIP] Remove head_first option by @yzhangcs in #339
  • [RWKV7] add input_precision param [skip test] by @zhiyuan1i in #335
  • [Testing] Add recursive dependency finding for test discovery by @zhiyuan1i in #341
  • [WIP] Delete head_first option for cumsum by @yzhangcs in #342
  • [WIP] Delete head_first tests for DeltaNet/GLA by @yzhangcs in #344
  • [Attn] Remove head_first & rename offsets to cu_seqlens by @yzhangcs in #345
  • [RWKV7] Drop some kernels to enhance speed by @zhiyuan1i in #346
  • Remove the head_first arg from several token mixing layer fns. by @yzhangcs in #347

New Contributors

Full Changelog: v0.1.2...v0.2.0