Releases: intel/cutlass-sycl
Releases · intel/cutlass-sycl
v3.9-0.3
What's Changed
Cutlass 3.9.2 SYCL backend Version 0.3 (2025-06-30)
- Add support for GEMM FP8 (E5M2 and E4M3)
- Add example for GEMM FP8 with support for channel-wise and group-wise quantization
- Add support for Grouped GEMM FP8
- Improve performance for FP8 to FP16 conversion
- Add support for epilogue data conversion
- Add support for FP16 GEMM with FP16 accumulator
- Add support for BF16 GEMM with BF16 accumulator
- Add support for mixed dtype GEMM with support for tensor-wise, channel-wise and group-wise quantization
- Add example of mixed dtype BF16 + INT8 using channel-wise and group-wise quantization
- Add example of mixed dtype FP16 + INT8 using tensor-wise quantization
- Add example of mixed dtype FP16 + INT4 using channel-wise and group-wise quantization
- Add support for zero-point quantization in INT4 and INT8 data types
- Add support for Flash Attention prefill FP8 with and without KV cache
- Add support for Flash Attention decode FP8 with and without KV cache
Full Changelog: v3.9-0.2...v3.9-0.3
Cutlass 3.9.2 SYCL backend Version 0.2
Cutlass 3.9.2 SYCL backend Version 0.2 (2025-05-30)
Based on CUTLASS 3.9.2 - May 2025 release
Platforms
- Support for Intel GPU Data Center Max (1100 and 1550)
- Support for Intel Arc B580 ("Battlemage")
Features
- GEMM/StreamK/SplitK with support for FP16 data type
- Flash attention prefill with Paged KV cache with support for FP16 data type
- Performance improvements for flash attention prefill and decode
Full Changelog: v3.9-0.1...v3.9-0.2
Cutlass 3.9 sycl backend version 0.1
Based on CUTLASS 3.9.0 March 2025 release
Platforms
- Support for Intel GPU Data Center Max (1100 and 1550)
- Support for Intel Arc B580 ("Battlemage")
Features
-
GEMM/StreamK/SplitK with support for bfloat16 data type
-
Flash attention prefill and decode with KV cache with support for bfloat16 data type
-
Support for epilogue operations:
- Element-wise, row-wise and column-wise bias
- ReLU, SiLU, GELU activation fns
- Softmax
-
Mixed precision GEMM (bfloat16/int8, half/int4) with dequantization support
-
Dual GEMM & Grouped GEMM
Full Changelog: https://github.com/codeplaysoftware/cutlass-sycl/commits/v3.9-0.1