Skip to content

v0.0.10

Latest
Compare
Choose a tag to compare
@oulgen oulgen released this 17 Jul 18:14
· 21 commits to main since this release
7d01817

What's Changed

  • [Benchmark] Add initial TritonBench integration and vector_add benchmark example by @yf225 in #247
  • Add static_range by @joydddd in #235
  • Cleanup/improve docstrings by @jansel in #250
  • [Benchmark] Add embedding benchmark by @yf225 in #248
  • [Benchmark] Add vector_exp benchmark by @yf225 in #249
  • Add rms_norm example and test by @yf225 in #252
  • [Benchmark] Add rms_norm benchmark by @yf225 in #253
  • Strip extra newlines from *.expected files by @jansel in #255
  • Fix issue with BLOCK_SIZE0.to(torch.int32) by @jansel in #254
  • Add hl.wait & AllGather Matmul example (via hl_ext helper). by @joydddd in #189
  • Add sum example and test by @yf225 in #256
  • [Benchmark] Add sum to TritonBench integration by @yf225 in #257
  • Rename benchmark folder by @yf225 in #258
  • Add hl.signal by @joydddd in #233
  • Add hl.wait for simultenous waiting for multiple gmem barriers by @joydddd in #243
  • Swap to using pyright by @oulgen in #259
  • Fix pyright errors in type_propagation.py by @yf225 in #266
  • [BE] Add spellchecker by @oulgen in #265
  • Remove pyre-ignore/pyre-fixme calls by @jansel in #274
  • Improve typing for helion.kernel by @jansel in #270
  • Add jagged_mean example by @yf225 in #263
  • [Benchmark] Add jagged_mean tritonbench integration by @yf225 in #264
  • Add fp8_gemm example and test by @yf225 in #267
  • [Benchmark] Add fp8_gemm to TritonBench integration by @yf225 in #268
  • Fix some pyright errors by @jansel in #276
  • Remove unused exception types by @jansel in #271
  • Fix docstring see also lists by @jansel in #272
  • [benchmarks] Change tritonbench api by @xuzhao9 in #260
  • Initial versison of documentation by @jansel in #273
  • Deploy docs to github pages by @jansel in #277
  • Fix lint error on main by @jansel in #281
  • Add a link to the documentation by @jansel in #282
  • [Benchmark] Fix tritonbench integration due to upstream changes by @yf225 in #278
  • [Benchmark] Allow using 'python benchmarks/run.py' to run all kernels by @yf225 in #280
  • Add implicit broadcasting tests by @jansel in #285
  • Add additional tl.range choices to persistent loop by @jansel in #287
  • Update autotuning example in docs by @jansel in #288
  • Add host side dead code elimination by @oulgen in #289
  • [Benchmark] Add attention tritonbench integration by @yf225 in #284
  • Add helion.exc.CannotModifyHostVariableOnDevice and helion.exc.CannotReadDeviceVariableOnHost by @jansel in #290
  • Fix unstable CI by @jansel in #299
  • Make to_triton_code config arg optional by @jansel in #291
  • Add helion.exc.DeviceTensorSubscriptAssignmentNotAllowed by @jansel in #292
  • Remove default configs from examples by @jansel in #295
  • Fix bug with tensor descriptor and small block size by @jansel in #296
  • Relax typing for CombineFunction by @jansel in #297
  • Add examples/segment_reduction.py by @jansel in #300
  • Add error for using a host tensor directly by @jansel in #306
  • Improve Tensor.item() handling by @jansel in #307
  • Fix type_info null errors by @oulgen in #294
  • Improve DCE by marking math functions as pure by @oulgen in #312
  • [Benchmark] Add softmax tritonbench integration by @yf225 in #286
  • Make imports relative by @jansel in #310
  • Generalize l2_grouping to support 3+ dimensions by @jansel in #313
  • Remove make_precompiler generated wrapper by @jansel in #314
  • Enforce ANN/PGH lints by @jansel in #315
  • Support dynamic fill value to hl.full by @jansel in #316
  • Use tensor device reference in persistent kernels by @jansel in #317
  • Add tl._experimental_make_tensor_descriptor support by @oulgen in #322
  • Fix variable scoping in nested loops for multi-pass kernels by @yf225 in #324
  • Add HELION_DEV_LOW_VRAM env var for low GPU memory machines by @yf225 in #325
  • Add cross_entropy example and unit test by @yf225 in #320
  • [Benchmark] Add cross_entropy to tritonbench integration by @yf225 in #321
  • Add literal index into tuple by @joydddd in #327
  • Improve naming for generated helper functions by @jansel in #323
  • Add hl.inline_asm_elementwise by @jansel in #328
  • Implement static tuple unrolling and hl.static_range by @jansel in #329
  • Add fp8_attention example and unit test by @yf225 in #318
  • [Benchmark] Add fp8_attention to tritonbench integration by @yf225 in #319

New Contributors

Full Changelog: v0.0.9...v0.0.10