v0.0.8
What's Changed
- Improve loop end bound optimization for nested tiling by @jansel in #192
- Set default dot_precision to TRITON_F32_DEFAULT by @jansel in #197
- Use _disable_flatten_get_tile helper in tile_id by @jansel in #200
- Throw type errors immediately by @jansel in #202
- Fix typo in LiteralType.merge by @jansel in #201
- Add support for global statements in type propagation by @jansel in #203
- Remove ErrorReporting class and simplify warning handling by @jansel in #204
- Add InvalidDeviceForLoop exception type by @jansel in #205
- Fix bug with renamed variable flowing into phi() node by @jansel in #206
- Move hl.grid tests to their own file by @jansel in #208
- Remove NDGridTileStrategy by @jansel in #209
- Simplify codegen for hl.grid by @jansel in #210
- Add support for hl.grid(begin, end, step) by @jansel in #211
- Support range() loops (alias for hl.grid) by @jansel in #212
- Move yz_grid disabling logic to ConfigSpec by @jansel in #213
- Relax chebyshev kernel test tolerance by @jansel in #214
- [RFC] Add static loop unrolling by @oulgen in #216
- Add support for torch.arange by @jansel in #215
- Fix a performance issue with Helion-emitted Flash Attention by @manman-ren in #181
- Fix issue with phi nodes and aliasing by @jansel in #220
- Fix duplicate argument handling in inductor lowering by @jansel in #222
- x[i] returns scalar when i=scalar by @joydddd in #223
- Fix config flatten spec for tile.id by @joydddd in #224
- Fix failing tests on main by @jansel in #231
- Refactor examples to use run_example helper by @jansel in #225
- Add tl.range loop_unroll_factor to autotuner by @jansel in #226
- Add tl.range num_stages to autotuner by @jansel in #227
- Add tl.range disallow_acc_multi_buffer to autotuner by @jansel in #228
- Add tl.range flatten to autotuner by @jansel in #229
New Contributors
- @manman-ren made their first contribution in #181
Full Changelog: v0.0.7...v0.0.8