Skip to content

Commit 35e4079

Browse files
committed
drm/v3d: Add job to pending list if the reset was skipped
When a CL/CSD job times out, we check if the GPU has made any progress since the last timeout. If so, instead of resetting the hardware, we skip the reset and let the timer get rearmed. This gives long-running jobs a chance to complete. However, when `timedout_job()` is called, the job in question is removed from the pending list, which means it won't be automatically freed through `free_job()`. Consequently, when we skip the reset and keep the job running, the job won't be freed when it finally completes. This situation leads to a memory leak, as exposed in [1] and [2]. Similarly to commit 704d3d6 ("drm/etnaviv: don't block scheduler when GPU is still active"), this patch ensures the job is put back on the pending list when extending the timeout. Cc: stable@vger.kernel.org # 6.0 Reported-by: Daivik Bhatia <dtgs1208@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227 [1] Closes: raspberrypi/linux#6817 [2] Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250430210643.57924-1-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
1 parent b662b16 commit 35e4079

File tree

1 file changed

+21
-7
lines changed

1 file changed

+21
-7
lines changed

drivers/gpu/drm/v3d/v3d_sched.c

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -744,11 +744,16 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
744744
return DRM_GPU_SCHED_STAT_NOMINAL;
745745
}
746746

747-
/* If the current address or return address have changed, then the GPU
748-
* has probably made progress and we should delay the reset. This
749-
* could fail if the GPU got in an infinite loop in the CL, but that
750-
* is pretty unlikely outside of an i-g-t testcase.
751-
*/
747+
static void
748+
v3d_sched_skip_reset(struct drm_sched_job *sched_job)
749+
{
750+
struct drm_gpu_scheduler *sched = sched_job->sched;
751+
752+
spin_lock(&sched->job_list_lock);
753+
list_add(&sched_job->list, &sched->pending_list);
754+
spin_unlock(&sched->job_list_lock);
755+
}
756+
752757
static enum drm_gpu_sched_stat
753758
v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q,
754759
u32 *timedout_ctca, u32 *timedout_ctra)
@@ -758,9 +763,16 @@ v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q,
758763
u32 ctca = V3D_CORE_READ(0, V3D_CLE_CTNCA(q));
759764
u32 ctra = V3D_CORE_READ(0, V3D_CLE_CTNRA(q));
760765

766+
/* If the current address or return address have changed, then the GPU
767+
* has probably made progress and we should delay the reset. This
768+
* could fail if the GPU got in an infinite loop in the CL, but that
769+
* is pretty unlikely outside of an i-g-t testcase.
770+
*/
761771
if (*timedout_ctca != ctca || *timedout_ctra != ctra) {
762772
*timedout_ctca = ctca;
763773
*timedout_ctra = ctra;
774+
775+
v3d_sched_skip_reset(sched_job);
764776
return DRM_GPU_SCHED_STAT_NOMINAL;
765777
}
766778

@@ -800,11 +812,13 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job)
800812
struct v3d_dev *v3d = job->base.v3d;
801813
u32 batches = V3D_CORE_READ(0, V3D_CSD_CURRENT_CFG4(v3d->ver));
802814

803-
/* If we've made progress, skip reset and let the timer get
804-
* rearmed.
815+
/* If we've made progress, skip reset, add the job to the pending
816+
* list, and let the timer get rearmed.
805817
*/
806818
if (job->timedout_batches != batches) {
807819
job->timedout_batches = batches;
820+
821+
v3d_sched_skip_reset(sched_job);
808822
return DRM_GPU_SCHED_STAT_NOMINAL;
809823
}
810824

0 commit comments

Comments
 (0)