|
| 1 | +--- |
| 2 | +contributor: max |
| 3 | +date: '2025-05-07T12:11:00' |
| 4 | +title: 'Joint Matrix: A Unified SYCL Extension for Matrix Hardware Programming' |
| 5 | +external_url: 'https://www.youtube.com/watch?v=BBESHzVVDYo' |
| 6 | +type: presentation |
| 7 | +tags: |
| 8 | + - sycl |
| 9 | + - oneapi |
| 10 | +--- |
| 11 | + |
| 12 | +Joint Matrix: A Unified SYCL Extension for Matrix Hardware Programming |
| 13 | + |
| 14 | +Joint matrix is a new SYCL extension for matrix hardware programming. It unifies |
| 15 | +targets like Intel Advanced Matrix Extensions (Intel AMX), Intel Xe Matrix Extensions |
| 16 | +(Intel XMX), NVIDIA* Tensor Cores, AMD* Matrix Cores, etc. In general, ML frameworks |
| 17 | +like Tensorflow and libraries like oneAPI Deep Neural Network Library (oneDNN) are capable |
| 18 | +of heavily utilizing matrix hardware acceleration, and are the answer for many types |
| 19 | +of users and applications who want high performance from such hardware. However, |
| 20 | +for users who want to build their own neural network applications, these libraries |
| 21 | +and frameworks become too high-level, because users cannot do custom optimizations, |
| 22 | +and too heavyweight, because the size of libraries is large. Moreover, new operations |
| 23 | +are often introduced in the machine learning domain for which frameworks and libraries |
| 24 | +do not provide timely and performant solutions. For such cases, APIs are needed to |
| 25 | +write custom workload-specific optimizations and this is where joint matrix can help. |
| 26 | +Joint matrix has a lower level of abstraction than these frameworks and libraries, |
| 27 | +enabling it to provide performance, productivity, and fusion capabilities but, at the |
| 28 | +same time, offers portability by using one code to target different matrix hardware. |
| 29 | + |
| 30 | +In this talk, we present (1) the main APIs introduced as part of SYCL joint matrix extension, |
| 31 | +(2) tuning techniques to fully utilize Intel hardware using SYCL joint matrix, and (3) |
| 32 | +the application and validation of this language feature and tuning techniques using the |
| 33 | +GEMM benchmark and the ability to fuse kernels such as GEMM and GELU. |
0 commit comments