Skip to content

Commit 6cdcc65

Browse files
committed
drm/nouveau: sched: avoid job races between entities
If a sched job depends on a dma-fence from a job from the same GPU scheduler instance, but a different scheduler entity, the GPU scheduler does only wait for the particular job to be scheduled, rather than for the job to fully complete. This is due to the GPU scheduler assuming that there is a scheduler instance per ring. However, the current implementation, in order to avoid arbitrary amounts of kthreads, has a single scheduler instance while scheduler entities represent rings. As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all out-fences in order to force the scheduler to wait for full job completion for dependent jobs from different entities and same scheduler instance. There is some work in progress [1] to address the issues of firmware schedulers; once it is in-tree the scheduler topology in Nouveau should be re-worked accordingly. [1] https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/ Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collaboralcom> Link: https://patchwork.freedesktop.org/patch/msgid/20230811010632.2473-1-dakr@redhat.com
1 parent 9c319a0 commit 6cdcc65

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

drivers/gpu/drm/nouveau/nouveau_sched.c

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,28 @@ nouveau_job_submit(struct nouveau_job *job)
292292
if (job->sync)
293293
done_fence = dma_fence_get(job->done_fence);
294294

295+
/* If a sched job depends on a dma-fence from a job from the same GPU
296+
* scheduler instance, but a different scheduler entity, the GPU
297+
* scheduler does only wait for the particular job to be scheduled,
298+
* rather than for the job to fully complete. This is due to the GPU
299+
* scheduler assuming that there is a scheduler instance per ring.
300+
* However, the current implementation, in order to avoid arbitrary
301+
* amounts of kthreads, has a single scheduler instance while scheduler
302+
* entities represent rings.
303+
*
304+
* As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all
305+
* out-fences in order to force the scheduler to wait for full job
306+
* completion for dependent jobs from different entities and same
307+
* scheduler instance.
308+
*
309+
* There is some work in progress [1] to address the issues of firmware
310+
* schedulers; once it is in-tree the scheduler topology in Nouveau
311+
* should be re-worked accordingly.
312+
*
313+
* [1] https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/
314+
*/
315+
set_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &job->done_fence->flags);
316+
295317
if (job->ops->armed_submit)
296318
job->ops->armed_submit(job);
297319

0 commit comments

Comments
 (0)