Skip to content

Commit 7b91683

Browse files
committed
Merge tag 'drm-misc-next-2025-02-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.15: UAPI Changes: device-wedged events: - Let's drivers notify userspace of hung-up devices via uevent Cross-subsystem Changes: media: - cec: tda998x: Import driver from DRM Core Changes: - Cleanups atomic-helper: - async-flip: Support on arbitrary planes - writeback: Fix use-after-free error - Document atomic-state history - Pleanty of cleanups to callback parameter names doc: - Test for kernel-doc errors format-helper: - Support ARGB8888-to-ARGB4444 pixel-format conversion panel-orientation-quirks: - Add quirks for AYANEO 2S, AYA NEO Flip DS and KB, AYA NEO Slide, GPD Win 2, OneXPlayer Mini (Intel) sched: - Add parameter struct for init Driver Changes: amdgpu: - Support device-wedged event - Support async pageflips on overlay planes amdxdna: - Refactoring ast: - Refactor cursor handling bridge: - Pass full atomic state to various callbacks - analogix-dp: Cleanups - cdns-mhdp8546: Fix clock enable/disable - nwl-dsi: Set bridge type - panel: Cleanups - ti-sn65dsi83: Add error recovery; Set bridge type i2c: - tda998x: Drop unused platform_data; Split driver into separate media and bridge drivers - Remove the obsolete directory i915: - Support device-wedged event nouveau: - Fixes panel: - visionox-r66451: Use multi-style MIPI-DSI functions v3d: - Handle clock vkms: - Fix use-after-free error xe: - Support device-wedged event xlnx: - Use mutex guards - Cleanups Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250220085321.GA184551@linux.fritz.box
2 parents 0ed1356 + e82e1a0 commit 7b91683

File tree

122 files changed

+1757
-1270
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

122 files changed

+1757
-1270
lines changed

Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ properties:
3535
vcc-supply:
3636
description: A 1.8V power supply (see regulator/regulator.yaml).
3737

38+
interrupts:
39+
maxItems: 1
40+
3841
ports:
3942
$ref: /schemas/graph.yaml#/properties/ports
4043

Documentation/gpu/drm-uapi.rst

Lines changed: 113 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -371,9 +371,119 @@ Reporting causes of resets
371371

372372
Apart from propagating the reset through the stack so apps can recover, it's
373373
really useful for driver developers to learn more about what caused the reset in
374-
the first place. DRM devices should make use of devcoredump to store relevant
375-
information about the reset, so this information can be added to user bug
376-
reports.
374+
the first place. For this, drivers can make use of devcoredump to store relevant
375+
information about the reset and send device wedged event with ``none`` recovery
376+
method (as explained in "Device Wedging" chapter) to notify userspace, so this
377+
information can be collected and added to user bug reports.
378+
379+
Device Wedging
380+
==============
381+
382+
Drivers can optionally make use of device wedged event (implemented as
383+
drm_dev_wedged_event() in DRM subsystem), which notifies userspace of 'wedged'
384+
(hanged/unusable) state of the DRM device through a uevent. This is useful
385+
especially in cases where the device is no longer operating as expected and has
386+
become unrecoverable from driver context. Purpose of this implementation is to
387+
provide drivers a generic way to recover the device with the help of userspace
388+
intervention, without taking any drastic measures (like resetting or
389+
re-enumerating the full bus, on which the underlying physical device is sitting)
390+
in the driver.
391+
392+
A 'wedged' device is basically a device that is declared dead by the driver
393+
after exhausting all possible attempts to recover it from driver context. The
394+
uevent is the notification that is sent to userspace along with a hint about
395+
what could possibly be attempted to recover the device from userspace and bring
396+
it back to usable state. Different drivers may have different ideas of a
397+
'wedged' device depending on hardware implementation of the underlying physical
398+
device, and hence the vendor agnostic nature of the event. It is up to the
399+
drivers to decide when they see the need for device recovery and how they want
400+
to recover from the available methods.
401+
402+
Driver prerequisites
403+
--------------------
404+
405+
The driver, before opting for recovery, needs to make sure that the 'wedged'
406+
device doesn't harm the system as a whole by taking care of the prerequisites.
407+
Necessary actions must include disabling DMA to system memory as well as any
408+
communication channels with other devices. Further, the driver must ensure
409+
that all dma_fences are signalled and any device state that the core kernel
410+
might depend on is cleaned up. All existing mmaps should be invalidated and
411+
page faults should be redirected to a dummy page. Once the event is sent, the
412+
device must be kept in 'wedged' state until the recovery is performed. New
413+
accesses to the device (IOCTLs) should be rejected, preferably with an error
414+
code that resembles the type of failure the device has encountered. This will
415+
signify the reason for wedging, which can be reported to the application if
416+
needed.
417+
418+
Recovery
419+
--------
420+
421+
Current implementation defines three recovery methods, out of which, drivers
422+
can use any one, multiple or none. Method(s) of choice will be sent in the
423+
uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
424+
more side-effects. If driver is unsure about recovery or method is unknown
425+
(like soft/hard system reboot, firmware flashing, physical device replacement
426+
or any other procedure which can't be attempted on the fly), ``WEDGED=unknown``
427+
will be sent instead.
428+
429+
Userspace consumers can parse this event and attempt recovery as per the
430+
following expectations.
431+
432+
=============== ========================================
433+
Recovery method Consumer expectations
434+
=============== ========================================
435+
none optional telemetry collection
436+
rebind unbind + bind driver
437+
bus-reset unbind + bus reset/re-enumeration + bind
438+
unknown consumer policy
439+
=============== ========================================
440+
441+
The only exception to this is ``WEDGED=none``, which signifies that the device
442+
was temporarily 'wedged' at some point but was recovered from driver context
443+
using device specific methods like reset. No explicit recovery is expected from
444+
the consumer in this case, but it can still take additional steps like gathering
445+
telemetry information (devcoredump, syslog). This is useful because the first
446+
hang is usually the most critical one which can result in consequential hangs or
447+
complete wedging.
448+
449+
Consumer prerequisites
450+
----------------------
451+
452+
It is the responsibility of the consumer to make sure that the device or its
453+
resources are not in use by any process before attempting recovery. With IOCTLs
454+
erroring out, all device memory should be unmapped and file descriptors should
455+
be closed to prevent leaks or undefined behaviour. The idea here is to clear the
456+
device of all user context beforehand and set the stage for a clean recovery.
457+
458+
Example
459+
-------
460+
461+
Udev rule::
462+
463+
SUBSYSTEM=="drm", ENV{WEDGED}=="rebind", DEVPATH=="*/drm/card[0-9]",
464+
RUN+="/path/to/rebind.sh $env{DEVPATH}"
465+
466+
Recovery script::
467+
468+
#!/bin/sh
469+
470+
DEVPATH=$(readlink -f /sys/$1/device)
471+
DEVICE=$(basename $DEVPATH)
472+
DRIVER=$(readlink -f $DEVPATH/driver)
473+
474+
echo -n $DEVICE > $DRIVER/unbind
475+
echo -n $DEVICE > $DRIVER/bind
476+
477+
Customization
478+
-------------
479+
480+
Although basic recovery is possible with a simple script, consumers can define
481+
custom policies around recovery. For example, if the driver supports multiple
482+
recovery methods, consumers can opt for the suitable one depending on scenarios
483+
like repeat offences or vendor specific failures. Consumers can also choose to
484+
have the device available for debugging or telemetry collection and base their
485+
recovery decision on the findings. This is useful especially when the driver is
486+
unsure about recovery or method is unknown.
377487

378488
.. _drm_driver_ioctl:
379489

Kbuild

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,3 +97,4 @@ obj-$(CONFIG_SAMPLES) += samples/
9797
obj-$(CONFIG_NET) += net/
9898
obj-y += virt/
9999
obj-y += $(ARCH_DRIVERS)
100+
obj-$(CONFIG_DRM_HEADER_TEST) += include/

MAINTAINERS

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8007,6 +8007,8 @@ F: include/drm/drm_privacy_screen*
80078007
DRM TTM SUBSYSTEM
80088008
M: Christian Koenig <christian.koenig@amd.com>
80098009
M: Huang Rui <ray.huang@amd.com>
8010+
R: Matthew Auld <matthew.auld@intel.com>
8011+
R: Matthew Brost <matthew.brost@intel.com>
80108012
L: dri-devel@lists.freedesktop.org
80118013
S: Maintained
80128014
T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
@@ -17120,8 +17122,7 @@ M: Russell King <linux@armlinux.org.uk>
1712017122
S: Maintained
1712117123
T: git git://git.armlinux.org.uk/~rmk/linux-arm.git drm-tda998x-devel
1712217124
T: git git://git.armlinux.org.uk/~rmk/linux-arm.git drm-tda998x-fixes
17123-
F: drivers/gpu/drm/i2c/tda998x_drv.c
17124-
F: include/drm/i2c/tda998x.h
17125+
F: drivers/gpu/drm/bridge/tda998x_drv.c
1712517126
F: include/dt-bindings/display/tda998x.h
1712617127
K: "nxp,tda998x"
1712717128

drivers/accel/amdxdna/aie2_ctx.c

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ static void aie2_job_release(struct kref *ref)
3434

3535
job = container_of(ref, struct amdxdna_sched_job, refcnt);
3636
amdxdna_sched_job_cleanup(job);
37+
atomic64_inc(&job->hwctx->job_free_cnt);
38+
wake_up(&job->hwctx->priv->job_free_wq);
3739
if (job->out_fence)
3840
dma_fence_put(job->out_fence);
3941
kfree(job);
@@ -134,7 +136,8 @@ static void aie2_hwctx_wait_for_idle(struct amdxdna_hwctx *hwctx)
134136
if (!fence)
135137
return;
136138

137-
dma_fence_wait(fence, false);
139+
/* Wait up to 2 seconds for fw to finish all pending requests */
140+
dma_fence_wait_timeout(fence, false, msecs_to_jiffies(2000));
138141
dma_fence_put(fence);
139142
}
140143

@@ -516,6 +519,14 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx)
516519
{
517520
struct amdxdna_client *client = hwctx->client;
518521
struct amdxdna_dev *xdna = client->xdna;
522+
const struct drm_sched_init_args args = {
523+
.ops = &sched_ops,
524+
.num_rqs = DRM_SCHED_PRIORITY_COUNT,
525+
.credit_limit = HWCTX_MAX_CMDS,
526+
.timeout = msecs_to_jiffies(HWCTX_MAX_TIMEOUT),
527+
.name = hwctx->name,
528+
.dev = xdna->ddev.dev,
529+
};
519530
struct drm_gpu_scheduler *sched;
520531
struct amdxdna_hwctx_priv *priv;
521532
struct amdxdna_gem_obj *heap;
@@ -573,9 +584,7 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx)
573584
might_lock(&priv->io_lock);
574585
fs_reclaim_release(GFP_KERNEL);
575586

576-
ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT,
577-
HWCTX_MAX_CMDS, 0, msecs_to_jiffies(HWCTX_MAX_TIMEOUT),
578-
NULL, NULL, hwctx->name, xdna->ddev.dev);
587+
ret = drm_sched_init(sched, &args);
579588
if (ret) {
580589
XDNA_ERR(xdna, "Failed to init DRM scheduler. ret %d", ret);
581590
goto free_cmd_bufs;
@@ -616,6 +625,7 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx)
616625
hwctx->status = HWCTX_STAT_INIT;
617626
ndev = xdna->dev_handle;
618627
ndev->hwctx_num++;
628+
init_waitqueue_head(&priv->job_free_wq);
619629

620630
XDNA_DBG(xdna, "hwctx %s init completed", hwctx->name);
621631

@@ -652,25 +662,23 @@ void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx)
652662
xdna = hwctx->client->xdna;
653663
ndev = xdna->dev_handle;
654664
ndev->hwctx_num--;
655-
drm_sched_wqueue_stop(&hwctx->priv->sched);
656665

657-
/* Now, scheduler will not send command to device. */
666+
XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq);
667+
drm_sched_entity_destroy(&hwctx->priv->entity);
668+
669+
aie2_hwctx_wait_for_idle(hwctx);
670+
671+
/* Request fw to destroy hwctx and cancel the rest pending requests */
658672
aie2_release_resource(hwctx);
659673

660-
/*
661-
* All submitted commands are aborted.
662-
* Restart scheduler queues to cleanup jobs. The amdxdna_sched_job_run()
663-
* will return NODEV if it is called.
664-
*/
665-
drm_sched_wqueue_start(&hwctx->priv->sched);
674+
/* Wait for all submitted jobs to be completed or canceled */
675+
wait_event(hwctx->priv->job_free_wq,
676+
atomic64_read(&hwctx->job_submit_cnt) ==
677+
atomic64_read(&hwctx->job_free_cnt));
666678

667-
aie2_hwctx_wait_for_idle(hwctx);
668-
drm_sched_entity_destroy(&hwctx->priv->entity);
669679
drm_sched_fini(&hwctx->priv->sched);
670680
aie2_ctx_syncobj_destroy(hwctx);
671681

672-
XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq);
673-
674682
for (idx = 0; idx < ARRAY_SIZE(hwctx->priv->cmd_buf); idx++)
675683
drm_gem_object_put(to_gobj(hwctx->priv->cmd_buf[idx]));
676684
amdxdna_gem_unpin(hwctx->priv->heap);
@@ -879,6 +887,7 @@ int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job,
879887
drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx);
880888

881889
aie2_job_put(job);
890+
atomic64_inc(&hwctx->job_submit_cnt);
882891

883892
return 0;
884893

drivers/accel/amdxdna/amdxdna_ctx.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,8 @@ int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, struct dr
220220
args->syncobj_handle = hwctx->syncobj_hdl;
221221
mutex_unlock(&xdna->dev_lock);
222222

223+
atomic64_set(&hwctx->job_submit_cnt, 0);
224+
atomic64_set(&hwctx->job_free_cnt, 0);
223225
XDNA_DBG(xdna, "PID %d create HW context %d, ret %d", client->pid, args->handle, ret);
224226
drm_dev_exit(idx);
225227
return 0;

drivers/accel/amdxdna/amdxdna_ctx.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,9 @@ struct amdxdna_hwctx {
8787
struct amdxdna_qos_info qos;
8888
struct amdxdna_hwctx_param_config_cu *cus;
8989
u32 syncobj_hdl;
90+
91+
atomic64_t job_submit_cnt;
92+
atomic64_t job_free_cnt ____cacheline_aligned_in_smp;
9093
};
9194

9295
#define drm_job_to_xdna_job(j) \

drivers/gpu/drm/Kconfig

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -326,8 +326,6 @@ config DRM_SCHED
326326
tristate
327327
depends on DRM
328328

329-
source "drivers/gpu/drm/i2c/Kconfig"
330-
331329
source "drivers/gpu/drm/arm/Kconfig"
332330

333331
source "drivers/gpu/drm/radeon/Kconfig"
@@ -494,6 +492,17 @@ config DRM_WERROR
494492

495493
If in doubt, say N.
496494

495+
config DRM_HEADER_TEST
496+
bool "Ensure DRM headers are self-contained and pass kernel-doc"
497+
depends on DRM && EXPERT
498+
default n
499+
help
500+
Ensure the DRM subsystem headers both under drivers/gpu/drm and
501+
include/drm compile, are self-contained, have header guards, and have
502+
no kernel-doc warnings.
503+
504+
If in doubt, say N.
505+
497506
endif
498507

499508
# Separate option because drm_panel_orientation_quirks.c is shared with fbdev

drivers/gpu/drm/Makefile

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,6 @@ obj-$(CONFIG_DRM_INGENIC) += ingenic/
197197
obj-$(CONFIG_DRM_LOGICVC) += logicvc/
198198
obj-$(CONFIG_DRM_MEDIATEK) += mediatek/
199199
obj-$(CONFIG_DRM_MESON) += meson/
200-
obj-y += i2c/
201200
obj-y += panel/
202201
obj-y += bridge/
203202
obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
@@ -222,3 +221,21 @@ obj-y += solomon/
222221
obj-$(CONFIG_DRM_SPRD) += sprd/
223222
obj-$(CONFIG_DRM_LOONGSON) += loongson/
224223
obj-$(CONFIG_DRM_POWERVR) += imagination/
224+
225+
# Ensure drm headers are self-contained and pass kernel-doc
226+
hdrtest-files := \
227+
$(shell cd $(src) && find . -maxdepth 1 -name 'drm_*.h') \
228+
$(shell cd $(src) && find display lib -name '*.h')
229+
230+
always-$(CONFIG_DRM_HEADER_TEST) += \
231+
$(patsubst %.h,%.hdrtest, $(hdrtest-files))
232+
233+
# Include the header twice to detect missing include guard.
234+
quiet_cmd_hdrtest = HDRTEST $(patsubst %.hdrtest,%.h,$@)
235+
cmd_hdrtest = \
236+
$(CC) $(c_flags) -fsyntax-only -x c /dev/null -include $< -include $<; \
237+
$(srctree)/scripts/kernel-doc -none $(if $(CONFIG_WERROR)$(CONFIG_DRM_WERROR),-Werror) $<; \
238+
touch $@
239+
240+
$(obj)/%.hdrtest: $(src)/%.h FORCE
241+
$(call if_changed_dep,hdrtest)

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2823,6 +2823,12 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev)
28232823

28242824
static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
28252825
{
2826+
struct drm_sched_init_args args = {
2827+
.ops = &amdgpu_sched_ops,
2828+
.num_rqs = DRM_SCHED_PRIORITY_COUNT,
2829+
.timeout_wq = adev->reset_domain->wq,
2830+
.dev = adev->dev,
2831+
};
28262832
long timeout;
28272833
int r, i;
28282834

@@ -2848,12 +2854,12 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
28482854
break;
28492855
}
28502856

2851-
r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
2852-
DRM_SCHED_PRIORITY_COUNT,
2853-
ring->num_hw_submission, 0,
2854-
timeout, adev->reset_domain->wq,
2855-
ring->sched_score, ring->name,
2856-
adev->dev);
2857+
args.timeout = timeout;
2858+
args.credit_limit = ring->num_hw_submission;
2859+
args.score = ring->sched_score;
2860+
args.name = ring->name;
2861+
2862+
r = drm_sched_init(&ring->sched, &args);
28572863
if (r) {
28582864
DRM_ERROR("Failed to create scheduler on ring %s.\n",
28592865
ring->name);
@@ -6116,6 +6122,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
61166122
dev_info(adev->dev, "GPU reset end with ret = %d\n", r);
61176123

61186124
atomic_set(&adev->reset_domain->reset_res, r);
6125+
6126+
if (!r)
6127+
drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE);
6128+
61196129
return r;
61206130
}
61216131

0 commit comments

Comments
 (0)