Skip to content

Commit 4d318f1

Browse files
authored
Create joint-matrix-a-unified-sycl-extension-for-matrix-hardware-programming.md
1 parent 32605e7 commit 4d318f1

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
contributor: max
3+
date: '2025-05-07T12:11:00'
4+
title: 'Joint Matrix: A Unified SYCL Extension for Matrix Hardware Programming'
5+
external_url: 'https://www.youtube.com/watch?v=BBESHzVVDYo'
6+
type: presentation
7+
tags:
8+
- sycl
9+
- oneapi
10+
---
11+
12+
Joint Matrix: A Unified SYCL Extension for Matrix Hardware Programming
13+
14+
Joint matrix is a new SYCL extension for matrix hardware programming. It unifies
15+
targets like Intel Advanced Matrix Extensions (Intel AMX), Intel Xe Matrix Extensions
16+
(Intel XMX), NVIDIA* Tensor Cores, AMD* Matrix Cores, etc. In general, ML frameworks
17+
like Tensorflow and libraries like oneAPI Deep Neural Network Library (oneDNN) are capable
18+
of heavily utilizing matrix hardware acceleration, and are the answer for many types
19+
of users and applications who want high performance from such hardware. However,
20+
for users who want to build their own neural network applications, these libraries
21+
and frameworks become too high-level, because users cannot do custom optimizations,
22+
and too heavyweight, because the size of libraries is large. Moreover, new operations
23+
are often introduced in the machine learning domain for which frameworks and libraries
24+
do not provide timely and performant solutions. For such cases, APIs are needed to
25+
write custom workload-specific optimizations and this is where joint matrix can help.
26+
Joint matrix has a lower level of abstraction than these frameworks and libraries,
27+
enabling it to provide performance, productivity, and fusion capabilities but, at the
28+
same time, offers portability by using one code to target different matrix hardware.
29+
30+
In this talk, we present (1) the main APIs introduced as part of SYCL joint matrix extension,
31+
(2) tuning techniques to fully utilize Intel hardware using SYCL joint matrix, and (3)
32+
the application and validation of this language feature and tuning techniques using the
33+
GEMM benchmark and the ability to fuse kernels such as GEMM and GELU.

0 commit comments

Comments
 (0)