Skip to content

v0.3.0

Latest
Compare
Choose a tag to compare
@vbharadwaj-bk vbharadwaj-bk released this 22 Jun 23:42
· 1 commit to main since this release
2dd1684

v0.3.0 (2025-06-22)

This release includes bugfixes and new opaque operations that compose with torch.compile for PT2.4-2.7. These will be unnecessary for PT2.8+.

Added:

  1. Opaque variants of major operations via PyTorch custom_op declarations. These functions cannot be traced through and fail for JITScript / AOTI. They are shims that enable composition with torch.compile pre-PT2.8.
  2. torch.load/torch.save functionality that, without torch.compile, is portable across GPU architectures.
  3. .to() support to move TensorProduct and TensorProductConv between devices or change datatypes.

Fixed:

  1. Gracefully records an error if libpython.so is not linked against C++ extension.
  2. Resolves Kahan summation / various other bugs for HIP at O3 compiler-optimization level.
  3. Removes multiple contexts spawning for GPU 0 when multiple devices are used.
  4. Zero-initialized gradient buffers to prevent backward pass garbage accumulation.