What's Changed
- [Benchmark] Add initial TritonBench integration and vector_add benchmark example by @yf225 in #247
- Add static_range by @joydddd in #235
- Cleanup/improve docstrings by @jansel in #250
- [Benchmark] Add embedding benchmark by @yf225 in #248
- [Benchmark] Add vector_exp benchmark by @yf225 in #249
- Add rms_norm example and test by @yf225 in #252
- [Benchmark] Add rms_norm benchmark by @yf225 in #253
- Strip extra newlines from *.expected files by @jansel in #255
- Fix issue with BLOCK_SIZE0.to(torch.int32) by @jansel in #254
- Add hl.wait & AllGather Matmul example (via hl_ext helper). by @joydddd in #189
- Add sum example and test by @yf225 in #256
- [Benchmark] Add sum to TritonBench integration by @yf225 in #257
- Rename benchmark folder by @yf225 in #258
- Add hl.signal by @joydddd in #233
- Add hl.wait for simultenous waiting for multiple gmem barriers by @joydddd in #243
- Swap to using pyright by @oulgen in #259
- Fix pyright errors in type_propagation.py by @yf225 in #266
- [BE] Add spellchecker by @oulgen in #265
- Remove pyre-ignore/pyre-fixme calls by @jansel in #274
- Improve typing for helion.kernel by @jansel in #270
- Add jagged_mean example by @yf225 in #263
- [Benchmark] Add jagged_mean tritonbench integration by @yf225 in #264
- Add fp8_gemm example and test by @yf225 in #267
- [Benchmark] Add fp8_gemm to TritonBench integration by @yf225 in #268
- Fix some pyright errors by @jansel in #276
- Remove unused exception types by @jansel in #271
- Fix docstring see also lists by @jansel in #272
- [benchmarks] Change tritonbench api by @xuzhao9 in #260
- Initial versison of documentation by @jansel in #273
- Deploy docs to github pages by @jansel in #277
- Fix lint error on main by @jansel in #281
- Add a link to the documentation by @jansel in #282
- [Benchmark] Fix tritonbench integration due to upstream changes by @yf225 in #278
- [Benchmark] Allow using 'python benchmarks/run.py' to run all kernels by @yf225 in #280
- Add implicit broadcasting tests by @jansel in #285
- Add additional tl.range choices to persistent loop by @jansel in #287
- Update autotuning example in docs by @jansel in #288
- Add host side dead code elimination by @oulgen in #289
- [Benchmark] Add attention tritonbench integration by @yf225 in #284
- Add helion.exc.CannotModifyHostVariableOnDevice and helion.exc.CannotReadDeviceVariableOnHost by @jansel in #290
- Fix unstable CI by @jansel in #299
- Make to_triton_code config arg optional by @jansel in #291
- Add helion.exc.DeviceTensorSubscriptAssignmentNotAllowed by @jansel in #292
- Remove default configs from examples by @jansel in #295
- Fix bug with tensor descriptor and small block size by @jansel in #296
- Relax typing for CombineFunction by @jansel in #297
- Add examples/segment_reduction.py by @jansel in #300
- Add error for using a host tensor directly by @jansel in #306
- Improve Tensor.item() handling by @jansel in #307
- Fix type_info null errors by @oulgen in #294
- Improve DCE by marking math functions as pure by @oulgen in #312
- [Benchmark] Add softmax tritonbench integration by @yf225 in #286
- Make imports relative by @jansel in #310
- Generalize l2_grouping to support 3+ dimensions by @jansel in #313
- Remove make_precompiler generated wrapper by @jansel in #314
- Enforce ANN/PGH lints by @jansel in #315
- Support dynamic fill value to hl.full by @jansel in #316
- Use tensor device reference in persistent kernels by @jansel in #317
- Add tl._experimental_make_tensor_descriptor support by @oulgen in #322
- Fix variable scoping in nested loops for multi-pass kernels by @yf225 in #324
- Add HELION_DEV_LOW_VRAM env var for low GPU memory machines by @yf225 in #325
- Add cross_entropy example and unit test by @yf225 in #320
- [Benchmark] Add cross_entropy to tritonbench integration by @yf225 in #321
- Add literal index into tuple by @joydddd in #327
- Improve naming for generated helper functions by @jansel in #323
- Add hl.inline_asm_elementwise by @jansel in #328
- Implement static tuple unrolling and hl.static_range by @jansel in #329
- Add fp8_attention example and unit test by @yf225 in #318
- [Benchmark] Add fp8_attention to tritonbench integration by @yf225 in #319
New Contributors
Full Changelog: v0.0.9...v0.0.10