Skip to content

Commit 9c2f033

Browse files
committed
Merge tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm
Pul drm fixes from Dave Airlie: "Regular weekly fixes, mostly amdgpu and xe. One nouveau fix is a better fix for the deadlock and also helps with a sync race we were seeing. dma-buf: - heaps CMA page accounting fix virtio-gpu: - fix segment size xe: - A crash fix - A fix for an assert due to missing mem_acces ref - Only allow a single user-fence per exec / bind. - Some sparse warning fixes - Two fixes for compilation failures on various odd combinations of gcc / arch pointed out on LKML. - Fix a fragile partial allocation pointed out on LKML. - A sysfs ABI documentation warning fix amdgpu: - Fix reboot issue seen on some 7000 series dGPUs - Fix client init order for KFD - Misc display fixes - USB-C fix - DCN 3.5 fixes - Fix issues with GPU scheduler and GPU reset - GPU firmware loading fix - Misc fixes - GC 11.5 fix - VCN 4.0.5 fix - IH overflow fix amdkfd: - SVM fixes - Trap handler fix - Fix device permission lookup - Properly reserve BO before validating it nouveau: - fence/irq lock deadlock fix (second attempt) - gsp command size fix * tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm: (35 commits) nouveau: offload fence uevents work to workqueue nouveau/gsp: use correct size for registry rpc. drm/amdgpu/pm: Use inline function for IP version check drm/hwmon: Fix abi doc warnings drm/xe: Make all GuC ABI shift values unsigned drm/xe/vm: Subclass userptr vmas drm/xe: Use LRC prefix rather than CTX prefix in lrc desc defines drm/xe: Don't use __user error pointers drm/xe: Annotate mcr_[un]lock() drm/xe: Only allow 1 ufence per exec / bind IOCTL drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platforms drm/xe: Fix crash in trace_dma_fence_init() drm/amdgpu: Reset IH OVERFLOW_CLEAR bit drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0 drm/amdkfd: reserve the BO before validating it drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()' drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()' drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()' drm/amd: Don't init MEC2 firmware when it fails to load ...
2 parents eab5c86 + 39126ab commit 9c2f033

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+475
-403
lines changed

Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
What: /sys/devices/.../hwmon/hwmon<i>/in0_input
1+
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/in0_input
22
Date: February 2023
33
KernelVersion: 6.2
44
Contact: intel-gfx@lists.freedesktop.org
55
Description: RO. Current Voltage in millivolt.
66

77
Only supported for particular Intel i915 graphics platforms.
88

9-
What: /sys/devices/.../hwmon/hwmon<i>/power1_max
9+
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max
1010
Date: February 2023
1111
KernelVersion: 6.2
1212
Contact: intel-gfx@lists.freedesktop.org
@@ -20,15 +20,15 @@ Description: RW. Card reactive sustained (PL1/Tau) power limit in microwatts.
2020

2121
Only supported for particular Intel i915 graphics platforms.
2222

23-
What: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max
23+
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_rated_max
2424
Date: February 2023
2525
KernelVersion: 6.2
2626
Contact: intel-gfx@lists.freedesktop.org
2727
Description: RO. Card default power limit (default TDP setting).
2828

2929
Only supported for particular Intel i915 graphics platforms.
3030

31-
What: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval
31+
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max_interval
3232
Date: February 2023
3333
KernelVersion: 6.2
3434
Contact: intel-gfx@lists.freedesktop.org
@@ -37,7 +37,7 @@ Description: RW. Sustained power limit interval (Tau in PL1/Tau) in
3737

3838
Only supported for particular Intel i915 graphics platforms.
3939

40-
What: /sys/devices/.../hwmon/hwmon<i>/power1_crit
40+
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_crit
4141
Date: February 2023
4242
KernelVersion: 6.2
4343
Contact: intel-gfx@lists.freedesktop.org
@@ -50,7 +50,7 @@ Description: RW. Card reactive critical (I1) power limit in microwatts.
5050

5151
Only supported for particular Intel i915 graphics platforms.
5252

53-
What: /sys/devices/.../hwmon/hwmon<i>/curr1_crit
53+
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/curr1_crit
5454
Date: February 2023
5555
KernelVersion: 6.2
5656
Contact: intel-gfx@lists.freedesktop.org
@@ -63,7 +63,7 @@ Description: RW. Card reactive critical (I1) power limit in milliamperes.
6363

6464
Only supported for particular Intel i915 graphics platforms.
6565

66-
What: /sys/devices/.../hwmon/hwmon<i>/energy1_input
66+
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/energy1_input
6767
Date: February 2023
6868
KernelVersion: 6.2
6969
Contact: intel-gfx@lists.freedesktop.org

Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
What: /sys/devices/.../hwmon/hwmon<i>/power1_max
1+
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max
22
Date: September 2023
33
KernelVersion: 6.5
44
Contact: intel-xe@lists.freedesktop.org
@@ -12,15 +12,15 @@ Description: RW. Card reactive sustained (PL1) power limit in microwatts.
1212

1313
Only supported for particular Intel xe graphics platforms.
1414

15-
What: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max
15+
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_rated_max
1616
Date: September 2023
1717
KernelVersion: 6.5
1818
Contact: intel-xe@lists.freedesktop.org
1919
Description: RO. Card default power limit (default TDP setting).
2020

2121
Only supported for particular Intel xe graphics platforms.
2222

23-
What: /sys/devices/.../hwmon/hwmon<i>/power1_crit
23+
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_crit
2424
Date: September 2023
2525
KernelVersion: 6.5
2626
Contact: intel-xe@lists.freedesktop.org
@@ -33,7 +33,7 @@ Description: RW. Card reactive critical (I1) power limit in microwatts.
3333

3434
Only supported for particular Intel xe graphics platforms.
3535

36-
What: /sys/devices/.../hwmon/hwmon<i>/curr1_crit
36+
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/curr1_crit
3737
Date: September 2023
3838
KernelVersion: 6.5
3939
Contact: intel-xe@lists.freedesktop.org
@@ -44,23 +44,23 @@ Description: RW. Card reactive critical (I1) power limit in milliamperes.
4444
the operating frequency if the power averaged over a window
4545
exceeds this limit.
4646

47-
What: /sys/devices/.../hwmon/hwmon<i>/in0_input
47+
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/in0_input
4848
Date: September 2023
4949
KernelVersion: 6.5
5050
Contact: intel-xe@lists.freedesktop.org
5151
Description: RO. Current Voltage in millivolt.
5252

5353
Only supported for particular Intel xe graphics platforms.
5454

55-
What: /sys/devices/.../hwmon/hwmon<i>/energy1_input
55+
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/energy1_input
5656
Date: September 2023
5757
KernelVersion: 6.5
5858
Contact: intel-xe@lists.freedesktop.org
5959
Description: RO. Energy input of device in microjoules.
6060

6161
Only supported for particular Intel xe graphics platforms.
6262

63-
What: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval
63+
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max_interval
6464
Date: October 2023
6565
KernelVersion: 6.6
6666
Contact: intel-xe@lists.freedesktop.org

drivers/dma-buf/heaps/cma_heap.c

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
168168
if (vmf->pgoff > buffer->pagecount)
169169
return VM_FAULT_SIGBUS;
170170

171-
vmf->page = buffer->pages[vmf->pgoff];
172-
get_page(vmf->page);
173-
174-
return 0;
171+
return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff]));
175172
}
176173

177174
static const struct vm_operations_struct dma_heap_vm_ops = {
@@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
185182
if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
186183
return -EINVAL;
187184

185+
vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
186+
188187
vma->vm_ops = &dma_heap_vm_ops;
189188
vma->vm_private_data = buffer;
190189

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -141,11 +141,31 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work)
141141
static const struct drm_client_funcs kfd_client_funcs = {
142142
.unregister = drm_client_release,
143143
};
144+
145+
int amdgpu_amdkfd_drm_client_create(struct amdgpu_device *adev)
146+
{
147+
int ret;
148+
149+
if (!adev->kfd.init_complete)
150+
return 0;
151+
152+
ret = drm_client_init(&adev->ddev, &adev->kfd.client, "kfd",
153+
&kfd_client_funcs);
154+
if (ret) {
155+
dev_err(adev->dev, "Failed to init DRM client: %d\n",
156+
ret);
157+
return ret;
158+
}
159+
160+
drm_client_register(&adev->kfd.client);
161+
162+
return 0;
163+
}
164+
144165
void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
145166
{
146167
int i;
147168
int last_valid_bit;
148-
int ret;
149169

150170
amdgpu_amdkfd_gpuvm_init_mem_limits();
151171

@@ -164,12 +184,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
164184
.enable_mes = adev->enable_mes,
165185
};
166186

167-
ret = drm_client_init(&adev->ddev, &adev->kfd.client, "kfd", &kfd_client_funcs);
168-
if (ret) {
169-
dev_err(adev->dev, "Failed to init DRM client: %d\n", ret);
170-
return;
171-
}
172-
173187
/* this is going to have a few of the MSBs set that we need to
174188
* clear
175189
*/
@@ -208,10 +222,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
208222

209223
adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev,
210224
&gpu_resources);
211-
if (adev->kfd.init_complete)
212-
drm_client_register(&adev->kfd.client);
213-
else
214-
drm_client_release(&adev->kfd.client);
215225

216226
amdgpu_amdkfd_total_mem_size += adev->gmc.real_vram_size;
217227

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,8 @@ int amdgpu_queue_mask_bit_to_set_resource_bit(struct amdgpu_device *adev,
182182
struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
183183
struct mm_struct *mm,
184184
struct svm_range_bo *svm_bo);
185+
186+
int amdgpu_amdkfd_drm_client_create(struct amdgpu_device *adev);
185187
#if defined(CONFIG_DEBUG_FS)
186188
int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
187189
#endif
@@ -301,7 +303,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev,
301303
struct kgd_mem *mem, void *drm_priv);
302304
int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
303305
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv);
304-
void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
306+
int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
305307
int amdgpu_amdkfd_gpuvm_sync_memory(
306308
struct amdgpu_device *adev, struct kgd_mem *mem, bool intr);
307309
int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
290290
for (i = 0; i < adev->gfx.num_compute_rings; i++) {
291291
struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
292292

293-
if (!(ring && drm_sched_wqueue_ready(&ring->sched)))
293+
if (!amdgpu_ring_sched_ready(ring))
294294
continue;
295295

296296
/* stop secheduler and drain ring. */

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2085,21 +2085,35 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
20852085
return ret;
20862086
}
20872087

2088-
void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
2088+
int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
20892089
{
20902090
struct kfd_mem_attachment *entry;
20912091
struct amdgpu_vm *vm;
2092+
int ret;
20922093

20932094
vm = drm_priv_to_vm(drm_priv);
20942095

20952096
mutex_lock(&mem->lock);
20962097

2098+
ret = amdgpu_bo_reserve(mem->bo, true);
2099+
if (ret)
2100+
goto out;
2101+
20972102
list_for_each_entry(entry, &mem->attachments, list) {
2098-
if (entry->bo_va->base.vm == vm)
2099-
kfd_mem_dmaunmap_attachment(mem, entry);
2103+
if (entry->bo_va->base.vm != vm)
2104+
continue;
2105+
if (entry->bo_va->base.bo->tbo.ttm &&
2106+
!entry->bo_va->base.bo->tbo.ttm->sg)
2107+
continue;
2108+
2109+
kfd_mem_dmaunmap_attachment(mem, entry);
21002110
}
21012111

2112+
amdgpu_bo_unreserve(mem->bo);
2113+
out:
21022114
mutex_unlock(&mem->lock);
2115+
2116+
return ret;
21032117
}
21042118

21052119
int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(

drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1678,7 +1678,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
16781678
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
16791679
struct amdgpu_ring *ring = adev->rings[i];
16801680

1681-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
1681+
if (!amdgpu_ring_sched_ready(ring))
16821682
continue;
16831683
drm_sched_wqueue_stop(&ring->sched);
16841684
}
@@ -1694,7 +1694,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
16941694
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
16951695
struct amdgpu_ring *ring = adev->rings[i];
16961696

1697-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
1697+
if (!amdgpu_ring_sched_ready(ring))
16981698
continue;
16991699
drm_sched_wqueue_start(&ring->sched);
17001700
}
@@ -1916,8 +1916,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
19161916

19171917
ring = adev->rings[val];
19181918

1919-
if (!ring || !ring->funcs->preempt_ib ||
1920-
!drm_sched_wqueue_ready(&ring->sched))
1919+
if (!amdgpu_ring_sched_ready(ring) ||
1920+
!ring->funcs->preempt_ib)
19211921
return -EINVAL;
19221922

19231923
/* the last preemption failed */

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Lines changed: 13 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -4121,23 +4121,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
41214121
}
41224122
}
41234123
} else {
4124-
switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
4125-
case IP_VERSION(13, 0, 0):
4126-
case IP_VERSION(13, 0, 7):
4127-
case IP_VERSION(13, 0, 10):
4128-
r = psp_gpu_reset(adev);
4129-
break;
4130-
default:
4131-
tmp = amdgpu_reset_method;
4132-
/* It should do a default reset when loading or reloading the driver,
4133-
* regardless of the module parameter reset_method.
4134-
*/
4135-
amdgpu_reset_method = AMD_RESET_METHOD_NONE;
4136-
r = amdgpu_asic_reset(adev);
4137-
amdgpu_reset_method = tmp;
4138-
break;
4139-
}
4140-
4124+
tmp = amdgpu_reset_method;
4125+
/* It should do a default reset when loading or reloading the driver,
4126+
* regardless of the module parameter reset_method.
4127+
*/
4128+
amdgpu_reset_method = AMD_RESET_METHOD_NONE;
4129+
r = amdgpu_asic_reset(adev);
4130+
amdgpu_reset_method = tmp;
41414131
if (r) {
41424132
dev_err(adev->dev, "asic reset on init failed\n");
41434133
goto failed;
@@ -5031,7 +5021,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
50315021
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
50325022
struct amdgpu_ring *ring = adev->rings[i];
50335023

5034-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
5024+
if (!amdgpu_ring_sched_ready(ring))
50355025
continue;
50365026

50375027
spin_lock(&ring->sched.job_list_lock);
@@ -5170,7 +5160,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
51705160
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
51715161
struct amdgpu_ring *ring = adev->rings[i];
51725162

5173-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
5163+
if (!amdgpu_ring_sched_ready(ring))
51745164
continue;
51755165

51765166
/* Clear job fence from fence drv to avoid force_completion
@@ -5637,7 +5627,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
56375627
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
56385628
struct amdgpu_ring *ring = tmp_adev->rings[i];
56395629

5640-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
5630+
if (!amdgpu_ring_sched_ready(ring))
56415631
continue;
56425632

56435633
drm_sched_stop(&ring->sched, job ? &job->base : NULL);
@@ -5706,7 +5696,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
57065696
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
57075697
struct amdgpu_ring *ring = tmp_adev->rings[i];
57085698

5709-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
5699+
if (!amdgpu_ring_sched_ready(ring))
57105700
continue;
57115701

57125702
drm_sched_start(&ring->sched, true);
@@ -6061,7 +6051,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
60616051
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
60626052
struct amdgpu_ring *ring = adev->rings[i];
60636053

6064-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
6054+
if (!amdgpu_ring_sched_ready(ring))
60656055
continue;
60666056

60676057
drm_sched_stop(&ring->sched, NULL);
@@ -6189,7 +6179,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
61896179
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
61906180
struct amdgpu_ring *ring = adev->rings[i];
61916181

6192-
if (!ring || !drm_sched_wqueue_ready(&ring->sched))
6182+
if (!amdgpu_ring_sched_ready(ring))
61936183
continue;
61946184

61956185
drm_sched_start(&ring->sched, true);

drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2255,6 +2255,10 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
22552255
if (ret)
22562256
goto err_pci;
22572257

2258+
ret = amdgpu_amdkfd_drm_client_create(adev);
2259+
if (ret)
2260+
goto err_pci;
2261+
22582262
/*
22592263
* 1. don't init fbdev on hw without DCE
22602264
* 2. don't init fbdev if there are no connectors

0 commit comments

Comments
 (0)