Skip to content

Commit 2c1ed90

Browse files
committed
Merge remote-tracking branch 'drm-misc/drm-misc-next-fixes' into drm-misc-fixes
Merge the few remaining patches stuck into drm-misc-next-fixes. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2 parents 41a2d82 + ecee4d0 commit 2c1ed90

File tree

3,889 files changed

+76159
-31391
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,889 files changed

+76159
-31391
lines changed

.mailmap

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,8 @@ Ben Widawsky <bwidawsk@kernel.org> <benjamin.widawsky@intel.com>
121121
Benjamin Poirier <benjamin.poirier@gmail.com> <bpoirier@suse.de>
122122
Benjamin Tissoires <bentiss@kernel.org> <benjamin.tissoires@gmail.com>
123123
Benjamin Tissoires <bentiss@kernel.org> <benjamin.tissoires@redhat.com>
124+
Bingwu Zhang <xtex@aosc.io> <xtexchooser@duck.com>
125+
Bingwu Zhang <xtex@aosc.io> <xtex@xtexx.eu.org>
124126
Bjorn Andersson <andersson@kernel.org> <bjorn@kryo.se>
125127
Bjorn Andersson <andersson@kernel.org> <bjorn.andersson@linaro.org>
126128
Bjorn Andersson <andersson@kernel.org> <bjorn.andersson@sonymobile.com>
@@ -200,6 +202,7 @@ Elliot Berman <quic_eberman@quicinc.com> <eberman@codeaurora.org>
200202
Enric Balletbo i Serra <eballetbo@kernel.org> <enric.balletbo@collabora.com>
201203
Enric Balletbo i Serra <eballetbo@kernel.org> <eballetbo@iseebcn.com>
202204
Erik Kaneda <erik.kaneda@intel.com> <erik.schmauss@intel.com>
205+
Ethan Carter Edwards <ethan@ethancedwards.com> Ethan Edwards <ethancarteredwards@gmail.com>
203206
Eugen Hristev <eugen.hristev@linaro.org> <eugen.hristev@microchip.com>
204207
Eugen Hristev <eugen.hristev@linaro.org> <eugen.hristev@collabora.com>
205208
Evgeniy Polyakov <johnpol@2ka.mipt.ru>
@@ -435,7 +438,7 @@ Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
435438
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@puri.sm>
436439
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
437440
Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> <martyna.szapar-mudlaw@intel.com>
438-
Mathieu Othacehe <m.othacehe@gmail.com> <othacehe@gnu.org>
441+
Mathieu Othacehe <othacehe@gnu.org> <m.othacehe@gmail.com>
439442
Mat Martineau <martineau@kernel.org> <mathew.j.martineau@linux.intel.com>
440443
Mat Martineau <martineau@kernel.org> <mathewm@codeaurora.org>
441444
Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com>
@@ -735,6 +738,7 @@ Wolfram Sang <wsa@kernel.org> <w.sang@pengutronix.de>
735738
Wolfram Sang <wsa@kernel.org> <wsa@the-dreams.de>
736739
Yakir Yang <kuankuan.y@gmail.com> <ykk@rock-chips.com>
737740
Yanteng Si <si.yanteng@linux.dev> <siyanteng@loongson.cn>
741+
Ying Huang <huang.ying.caritas@gmail.com> <ying.huang@intel.com>
738742
Yusuke Goda <goda.yusuke@renesas.com>
739743
Zack Rusin <zack.rusin@broadcom.com> <zackr@vmware.com>
740744
Zhu Yanjun <zyjzyj2000@gmail.com> <yanjunz@nvidia.com>

CREDITS

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ N: Thomas Abraham
2020
E: thomas.ab@samsung.com
2121
D: Samsung pin controller driver
2222

23+
N: Jose Abreu
24+
E: jose.abreu@synopsys.com
25+
D: Synopsys DesignWare XPCS MDIO/PCS driver.
26+
2327
N: Dragos Acostachioaie
2428
E: dragos@iname.com
2529
W: http://www.arbornet.org/~dragos
@@ -1428,6 +1432,10 @@ S: 8124 Constitution Apt. 7
14281432
S: Sterling Heights, Michigan 48313
14291433
S: USA
14301434

1435+
N: Andy Gospodarek
1436+
E: andy@greyhouse.net
1437+
D: Maintenance and contributions to the network interface bonding driver.
1438+
14311439
N: Wolfgang Grandegger
14321440
E: wg@grandegger.com
14331441
D: Controller Area Network (device drivers)
@@ -1812,6 +1820,10 @@ D: Author/maintainer of most DRM drivers (especially ATI, MGA)
18121820
D: Core DRM templates, general DRM and 3D-related hacking
18131821
S: No fixed address
18141822

1823+
N: Woojung Huh
1824+
E: woojung.huh@microchip.com
1825+
D: Microchip LAN78XX USB Ethernet driver
1826+
18151827
N: Kenn Humborg
18161828
E: kenn@wombat.ie
18171829
D: Mods to loop device to support sparse backing files

Documentation/ABI/testing/sysfs-class-watchdog

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Description:
7676
timeout when the pretimeout interrupt is delivered. Pretimeout
7777
is an optional feature.
7878

79-
What: /sys/class/watchdog/watchdogn/pretimeout_avaialable_governors
79+
What: /sys/class/watchdog/watchdogn/pretimeout_available_governors
8080
Date: February 2017
8181
Contact: Wim Van Sebroeck <wim@iguana.be>
8282
Description:
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
.. SPDX-License-Identifier: GPL-2.0-only
2+
3+
.. include:: <isonum.txt>
4+
5+
=========
6+
AMD NPU
7+
=========
8+
9+
:Copyright: |copy| 2024 Advanced Micro Devices, Inc.
10+
:Author: Sonal Santan <sonal.santan@amd.com>
11+
12+
Overview
13+
========
14+
15+
AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator
16+
integrated into AMD client APU. NPU enables efficient execution of Machine
17+
Learning applications like CNN, LLM, etc. NPU is based on
18+
`AMD XDNA Architecture`_. NPU is managed by **amdxdna** driver.
19+
20+
21+
Hardware Description
22+
====================
23+
24+
AMD NPU consists of the following hardware components:
25+
26+
AMD XDNA Array
27+
--------------
28+
29+
AMD XDNA Array comprises of 2D array of compute and memory tiles built with
30+
`AMD AI Engine Technology`_. Each column has 4 rows of compute tiles and 1
31+
row of memory tile. Each compute tile contains a VLIW processor with its own
32+
dedicated program and data memory. The memory tile acts as L2 memory. The 2D
33+
array can be partitioned at a column boundary creating a spatially isolated
34+
partition which can be bound to a workload context.
35+
36+
Each column also has dedicated DMA engines to move data between host DDR and
37+
memory tile.
38+
39+
AMD Phoenix and AMD Hawk Point client NPU have a 4x5 topology, i.e., 4 rows of
40+
compute tiles arranged into 5 columns. AMD Strix Point client APU have 4x8
41+
topology, i.e., 4 rows of compute tiles arranged into 8 columns.
42+
43+
Shared L2 Memory
44+
----------------
45+
46+
The single row of memory tiles create a pool of software managed on chip L2
47+
memory. DMA engines are used to move data between host DDR and memory tiles.
48+
AMD Phoenix and AMD Hawk Point NPUs have a total of 2560 KB of L2 memory.
49+
AMD Strix Point NPU has a total of 4096 KB of L2 memory.
50+
51+
Microcontroller
52+
---------------
53+
54+
A microcontroller runs NPU Firmware which is responsible for command processing,
55+
XDNA Array partition setup, XDNA Array configuration, workload context
56+
management and workload orchestration.
57+
58+
NPU Firmware uses a dedicated instance of an isolated non-privileged context
59+
called ERT to service each workload context. ERT is also used to execute user
60+
provided ``ctrlcode`` associated with the workload context.
61+
62+
NPU Firmware uses a single isolated privileged context called MERT to service
63+
management commands from the amdxdna driver.
64+
65+
Mailboxes
66+
---------
67+
68+
The microcontroller and amdxdna driver use a privileged channel for management
69+
tasks like setting up of contexts, telemetry, query, error handling, setting up
70+
user channel, etc. As mentioned before, privileged channel requests are
71+
serviced by MERT. The privileged channel is bound to a single mailbox.
72+
73+
The microcontroller and amdxdna driver use a dedicated user channel per
74+
workload context. The user channel is primarily used for submitting work to
75+
the NPU. As mentioned before, a user channel requests are serviced by an
76+
instance of ERT. Each user channel is bound to its own dedicated mailbox.
77+
78+
PCIe EP
79+
-------
80+
81+
NPU is visible to the x86 host CPU as a PCIe device with multiple BARs and some
82+
MSI-X interrupt vectors. NPU uses a dedicated high bandwidth SoC level fabric
83+
for reading or writing into host memory. Each instance of ERT gets its own
84+
dedicated MSI-X interrupt. MERT gets a single instance of MSI-X interrupt.
85+
86+
The number of PCIe BARs varies depending on the specific device. Based on their
87+
functions, PCIe BARs can generally be categorized into the following types.
88+
89+
* PSP BAR: Expose the AMD PSP (Platform Security Processor) function
90+
* SMU BAR: Expose the AMD SMU (System Management Unit) function
91+
* SRAM BAR: Expose ring buffers for the mailbox
92+
* Mailbox BAR: Expose the mailbox control registers (head, tail and ISR
93+
registers etc.)
94+
* Public Register BAR: Expose public registers
95+
96+
On specific devices, the above-mentioned BAR type might be combined into a
97+
single physical PCIe BAR. Or a module might require two physical PCIe BARs to
98+
be fully functional. For example,
99+
100+
* On AMD Phoenix device, PSP, SMU, Public Register BARs are on PCIe BAR index 0.
101+
* On AMD Strix Point device, Mailbox and Public Register BARs are on PCIe BAR
102+
index 0. The PSP has some registers in PCIe BAR index 0 (Public Register BAR)
103+
and PCIe BAR index 4 (PSP BAR).
104+
105+
Process Isolation Hardware
106+
--------------------------
107+
108+
As explained before, XDNA Array can be dynamically divided into isolated
109+
spatial partitions, each of which may have one or more columns. The spatial
110+
partition is setup by programming the column isolation registers by the
111+
microcontroller. Each spatial partition is associated with a PASID which is
112+
also programmed by the microcontroller. Hence multiple spatial partitions in
113+
the NPU can make concurrent host access protected by PASID.
114+
115+
The NPU FW itself uses microcontroller MMU enforced isolated contexts for
116+
servicing user and privileged channel requests.
117+
118+
119+
Mixed Spatial and Temporal Scheduling
120+
=====================================
121+
122+
AMD XDNA architecture supports mixed spatial and temporal (time sharing)
123+
scheduling of 2D array. This means that spatial partitions may be setup and
124+
torn down dynamically to accommodate various workloads. A *spatial* partition
125+
may be *exclusively* bound to one workload context while another partition may
126+
be *temporarily* bound to more than one workload contexts. The microcontroller
127+
updates the PASID for a temporarily shared partition to match the context that
128+
has been bound to the partition at any moment.
129+
130+
Resource Solver
131+
---------------
132+
133+
The Resource Solver component of the amdxdna driver manages the allocation
134+
of 2D array among various workloads. Every workload describes the number
135+
of columns required to run the NPU binary in its metadata. The Resource Solver
136+
component uses hints passed by the workload and its own heuristics to
137+
decide 2D array (re)partition strategy and mapping of workloads for spatial and
138+
temporal sharing of columns. The FW enforces the context-to-column(s) resource
139+
binding decisions made by the Resource Solver.
140+
141+
AMD Phoenix and AMD Hawk Point client NPU can support 6 concurrent workload
142+
contexts. AMD Strix Point can support 16 concurrent workload contexts.
143+
144+
145+
Application Binaries
146+
====================
147+
148+
A NPU application workload is comprised of two separate binaries which are
149+
generated by the NPU compiler.
150+
151+
1. AMD XDNA Array overlay, which is used to configure a NPU spatial partition.
152+
The overlay contains instructions for setting up the stream switch
153+
configuration and ELF for the compute tiles. The overlay is loaded on the
154+
spatial partition bound to the workload by the associated ERT instance.
155+
Refer to the
156+
`Versal Adaptive SoC AIE-ML Architecture Manual (AM020)`_ for more details.
157+
158+
2. ``ctrlcode``, used for orchestrating the overlay loaded on the spatial
159+
partition. ``ctrlcode`` is executed by the ERT running in protected mode on
160+
the microcontroller in the context of the workload. ``ctrlcode`` is made up
161+
of a sequence of opcodes named ``XAie_TxnOpcode``. Refer to the
162+
`AI Engine Run Time`_ for more details.
163+
164+
165+
Special Host Buffers
166+
====================
167+
168+
Per-context Instruction Buffer
169+
------------------------------
170+
171+
Every workload context uses a host resident 64 MB buffer which is memory
172+
mapped into the ERT instance created to service the workload. The ``ctrlcode``
173+
used by the workload is copied into this special memory. This buffer is
174+
protected by PASID like all other input/output buffers used by that workload.
175+
Instruction buffer is also mapped into the user space of the workload.
176+
177+
Global Privileged Buffer
178+
------------------------
179+
180+
In addition, the driver also allocates a single buffer for maintenance tasks
181+
like recording errors from MERT. This global buffer uses the global IOMMU
182+
domain and is only accessible by MERT.
183+
184+
185+
High-level Use Flow
186+
===================
187+
188+
Here are the steps to run a workload on AMD NPU:
189+
190+
1. Compile the workload into an overlay and a ``ctrlcode`` binary.
191+
2. Userspace opens a context in the driver and provides the overlay.
192+
3. The driver checks with the Resource Solver for provisioning a set of columns
193+
for the workload.
194+
4. The driver then asks MERT to create a context on the device with the desired
195+
columns.
196+
5. MERT then creates an instance of ERT. MERT also maps the Instruction Buffer
197+
into ERT memory.
198+
6. The userspace then copies the ``ctrlcode`` to the Instruction Buffer.
199+
7. Userspace then creates a command buffer with pointers to input, output, and
200+
instruction buffer; it then submits command buffer with the driver and goes
201+
to sleep waiting for completion.
202+
8. The driver sends the command over the Mailbox to ERT.
203+
9. ERT *executes* the ``ctrlcode`` in the instruction buffer.
204+
10. Execution of the ``ctrlcode`` kicks off DMAs to and from the host DDR while
205+
AMD XDNA Array is running.
206+
11. When ERT reaches end of ``ctrlcode``, it raises an MSI-X to send completion
207+
signal to the driver which then wakes up the waiting workload.
208+
209+
210+
Boot Flow
211+
=========
212+
213+
amdxdna driver uses PSP to securely load signed NPU FW and kick off the boot
214+
of the NPU microcontroller. amdxdna driver then waits for the alive signal in
215+
a special location on BAR 0. The NPU is switched off during SoC suspend and
216+
turned on after resume where the NPU FW is reloaded, and the handshake is
217+
performed again.
218+
219+
220+
Userspace components
221+
====================
222+
223+
Compiler
224+
--------
225+
226+
Peano is an LLVM based open-source compiler for AMD XDNA Array compute tile
227+
available at:
228+
https://github.com/Xilinx/llvm-aie
229+
230+
The open-source IREE compiler supports graph compilation of ML models for AMD
231+
NPU and uses Peano underneath. It is available at:
232+
https://github.com/nod-ai/iree-amd-aie
233+
234+
Usermode Driver (UMD)
235+
---------------------
236+
237+
The open-source XRT runtime stack interfaces with amdxdna kernel driver. XRT
238+
can be found at:
239+
https://github.com/Xilinx/XRT
240+
241+
The open-source XRT shim for NPU is can be found at:
242+
https://github.com/amd/xdna-driver
243+
244+
245+
DMA Operation
246+
=============
247+
248+
DMA operation instructions are encoded in the ``ctrlcode`` as
249+
``XAIE_IO_BLOCKWRITE`` opcode. When ERT executes ``XAIE_IO_BLOCKWRITE``, DMA
250+
operations between host DDR and L2 memory are effected.
251+
252+
253+
Error Handling
254+
==============
255+
256+
When MERT detects an error in AMD XDNA Array, it pauses execution for that
257+
workload context and sends an asynchronous message to the driver over the
258+
privileged channel. The driver then sends a buffer pointer to MERT to capture
259+
the register states for the partition bound to faulting workload context. The
260+
driver then decodes the error by reading the contents of the buffer pointer.
261+
262+
263+
Telemetry
264+
=========
265+
266+
MERT can report various kinds of telemetry information like the following:
267+
268+
* L1 interrupt counter
269+
* DMA counter
270+
* Deep Sleep counter
271+
* etc.
272+
273+
274+
References
275+
==========
276+
277+
- `AMD XDNA Architecture <https://www.amd.com/en/technologies/xdna.html>`_
278+
- `AMD AI Engine Technology <https://www.xilinx.com/products/technology/ai-engine.html>`_
279+
- `Peano <https://github.com/Xilinx/llvm-aie>`_
280+
- `Versal Adaptive SoC AIE-ML Architecture Manual (AM020) <https://docs.amd.com/r/en-US/am020-versal-aie-ml>`_
281+
- `AI Engine Run Time <https://github.com/Xilinx/aie-rt/tree/release/main_aig>`_

Documentation/accel/amdxdna/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.. SPDX-License-Identifier: GPL-2.0-only
2+
3+
=====================================
4+
accel/amdxdna NPU driver
5+
=====================================
6+
7+
The accel/amdxdna driver supports the AMD NPU (Neural Processing Unit).
8+
9+
.. toctree::
10+
11+
amdnpu

Documentation/accel/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Compute Accelerators
88
:maxdepth: 1
99

1010
introduction
11+
amdxdna/index
1112
qaic/index
1213

1314
.. only:: subproject and html

0 commit comments

Comments
 (0)