Skip to content

Commit 9078a5b

Browse files
Prike Liangalexdeucher
authored andcommitted
drm/amdkfd: only flush the validate MES contex
The following page fault was observed duringthe KFD process release. In this particular error case, the HIP test (./MemcpyPerformance -h) does not require the queue. As a result, the process_context_addr was not assigned when the KFD process was released, ultimately leading to this page fault during the execution of the function kfd_process_dequeue_from_all_devices(). [345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0) [345962.295333] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33 [345962.296097] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [345962.296394] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [345962.296633] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1 [345962.296876] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [345962.297135] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 [345962.297377] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
1 parent f88192d commit 9078a5b

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,9 +86,12 @@ void kfd_process_dequeue_from_device(struct kfd_process_device *pdd)
8686

8787
if (pdd->already_dequeued)
8888
return;
89-
89+
/* The MES context flush needs to filter out the case which the
90+
* KFD process is created without setting up the MES context and
91+
* queue for creating a compute queue.
92+
*/
9093
dev->dqm->ops.process_termination(dev->dqm, &pdd->qpd);
91-
if (dev->kfd->shared_resources.enable_mes &&
94+
if (dev->kfd->shared_resources.enable_mes && !!pdd->proc_ctx_gpu_addr &&
9295
down_read_trylock(&dev->adev->reset_domain->sem)) {
9396
amdgpu_mes_flush_shader_debugger(dev->adev,
9497
pdd->proc_ctx_gpu_addr);

0 commit comments

Comments
 (0)