Releases · intel/cutlass-sycl · GitHub

30 Jun 21:12

mehdi-goli

v3.9-0.3 Latest

Latest

What's Changed

Cutlass 3.9.2 SYCL backend Version 0.3 (2025-06-30)

Add support for GEMM FP8 (E5M2 and E4M3)
Add example for GEMM FP8 with support for channel-wise and group-wise quantization
Add support for Grouped GEMM FP8
Improve performance for FP8 to FP16 conversion
Add support for epilogue data conversion
Add support for FP16 GEMM with FP16 accumulator
Add support for BF16 GEMM with BF16 accumulator
Add support for mixed dtype GEMM with support for tensor-wise, channel-wise and group-wise quantization
Add example of mixed dtype BF16 + INT8 using channel-wise and group-wise quantization
Add example of mixed dtype FP16 + INT8 using tensor-wise quantization
Add example of mixed dtype FP16 + INT4 using channel-wise and group-wise quantization
Add support for zero-point quantization in INT4 and INT8 data types
Add support for Flash Attention prefill FP8 with and without KV cache
Add support for Flash Attention decode FP8 with and without KV cache

Full Changelog: v3.9-0.2...v3.9-0.3

Assets 2

30 May 23:43

mehdi-goli

Cutlass 3.9.2 SYCL backend Version 0.2

Cutlass 3.9.2 SYCL backend Version 0.2 (2025-05-30)
Based on CUTLASS 3.9.2 - May 2025 release

Platforms

Support for Intel GPU Data Center Max (1100 and 1550)
Support for Intel Arc B580 ("Battlemage")

Features

GEMM/StreamK/SplitK with support for FP16 data type
Flash attention prefill with Paged KV cache with support for FP16 data type
Performance improvements for flash attention prefill and decode

Full Changelog: v3.9-0.1...v3.9-0.2

Assets 2

30 Apr 01:12

mehdi-goli

Cutlass 3.9 sycl backend version 0.1

Based on CUTLASS 3.9.0 March 2025 release

Platforms

Support for Intel GPU Data Center Max (1100 and 1550)
Support for Intel Arc B580 ("Battlemage")

Features

GEMM/StreamK/SplitK with support for bfloat16 data type
Flash attention prefill and decode with KV cache with support for bfloat16 data type
Support for epilogue operations:
- Element-wise, row-wise and column-wise bias
- ReLU, SiLU, GELU activation fns
- Softmax
Mixed precision GEMM (bfloat16/int8, half/int4) with dequantization support
Dual GEMM & Grouped GEMM

Full Changelog: https://github.com/codeplaysoftware/cutlass-sycl/commits/v3.9-0.1

Assets 2