Skip to content

v3.9-0.3

Latest
Compare
Choose a tag to compare
@mehdi-goli mehdi-goli released this 30 Jun 21:12
· 4 commits to sycl-develop since this release
467a2bb

What's Changed

Cutlass 3.9.2 SYCL backend Version 0.3 (2025-06-30)

  • Add support for GEMM FP8 (E5M2 and E4M3)
  • Add example for GEMM FP8 with support for channel-wise and group-wise quantization
  • Add support for Grouped GEMM FP8
  • Improve performance for FP8 to FP16 conversion
  • Add support for epilogue data conversion
  • Add support for FP16 GEMM with FP16 accumulator
  • Add support for BF16 GEMM with BF16 accumulator
  • Add support for mixed dtype GEMM with support for tensor-wise, channel-wise and group-wise quantization
  • Add example of mixed dtype BF16 + INT8 using channel-wise and group-wise quantization
  • Add example of mixed dtype FP16 + INT8 using tensor-wise quantization
  • Add example of mixed dtype FP16 + INT4 using channel-wise and group-wise quantization
  • Add support for zero-point quantization in INT4 and INT8 data types
  • Add support for Flash Attention prefill FP8 with and without KV cache
  • Add support for Flash Attention decode FP8 with and without KV cache

Full Changelog: v3.9-0.2...v3.9-0.3