Skip to content

Commit a337b64

Browse files
jkrzyszt-inteltursulin
authored andcommitted
drm/i915: Fix premature release of request's reusable memory
Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity. After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits. We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory. We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence. When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request. Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed. v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 946e047) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
1 parent 6a35f22 commit a337b64

File tree

2 files changed

+81
-29
lines changed

2 files changed

+81
-29
lines changed

drivers/gpu/drm/i915/i915_active.c

Lines changed: 70 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
449449
}
450450
} while (unlikely(is_barrier(active)));
451451

452-
if (!__i915_active_fence_set(active, fence))
452+
fence = __i915_active_fence_set(active, fence);
453+
if (!fence)
453454
__i915_active_acquire(ref);
455+
else
456+
dma_fence_put(fence);
454457

455458
out:
456459
i915_active_release(ref);
@@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_active *ref,
469472
return NULL;
470473
}
471474

472-
rcu_read_lock();
473475
prev = __i915_active_fence_set(active, fence);
474-
if (prev)
475-
prev = dma_fence_get_rcu(prev);
476-
else
476+
if (!prev)
477477
__i915_active_acquire(ref);
478-
rcu_read_unlock();
479478

480479
return prev;
481480
}
@@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(struct i915_request *rq)
10191018
*
10201019
* Records the new @fence as the last active fence along its timeline in
10211020
* this active tracker, moving the tracking callbacks from the previous
1022-
* fence onto this one. Returns the previous fence (if not already completed),
1023-
* which the caller must ensure is executed before the new fence. To ensure
1024-
* that the order of fences within the timeline of the i915_active_fence is
1025-
* understood, it should be locked by the caller.
1021+
* fence onto this one. Gets and returns a reference to the previous fence
1022+
* (if not already completed), which the caller must put after making sure
1023+
* that it is executed before the new fence. To ensure that the order of
1024+
* fences within the timeline of the i915_active_fence is understood, it
1025+
* should be locked by the caller.
10261026
*/
10271027
struct dma_fence *
10281028
__i915_active_fence_set(struct i915_active_fence *active,
@@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_active_fence *active,
10311031
struct dma_fence *prev;
10321032
unsigned long flags;
10331033

1034-
if (fence == rcu_access_pointer(active->fence))
1034+
/*
1035+
* In case of fences embedded in i915_requests, their memory is
1036+
* SLAB_FAILSAFE_BY_RCU, then it can be reused right after release
1037+
* by new requests. Then, there is a risk of passing back a pointer
1038+
* to a new, completely unrelated fence that reuses the same memory
1039+
* while tracked under a different active tracker. Combined with i915
1040+
* perf open/close operations that build await dependencies between
1041+
* engine kernel context requests and user requests from different
1042+
* timelines, this can lead to dependency loops and infinite waits.
1043+
*
1044+
* As a countermeasure, we try to get a reference to the active->fence
1045+
* first, so if we succeed and pass it back to our user then it is not
1046+
* released and potentially reused by an unrelated request before the
1047+
* user has a chance to set up an await dependency on it.
1048+
*/
1049+
prev = i915_active_fence_get(active);
1050+
if (fence == prev)
10351051
return fence;
10361052

10371053
GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
@@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_active_fence *active,
10401056
* Consider that we have two threads arriving (A and B), with
10411057
* C already resident as the active->fence.
10421058
*
1043-
* A does the xchg first, and so it sees C or NULL depending
1044-
* on the timing of the interrupt handler. If it is NULL, the
1045-
* previous fence must have been signaled and we know that
1046-
* we are first on the timeline. If it is still present,
1047-
* we acquire the lock on that fence and serialise with the interrupt
1048-
* handler, in the process removing it from any future interrupt
1049-
* callback. A will then wait on C before executing (if present).
1050-
*
1051-
* As B is second, it sees A as the previous fence and so waits for
1052-
* it to complete its transition and takes over the occupancy for
1053-
* itself -- remembering that it needs to wait on A before executing.
1059+
* Both A and B have got a reference to C or NULL, depending on the
1060+
* timing of the interrupt handler. Let's assume that if A has got C
1061+
* then it has locked C first (before B).
10541062
*
10551063
* Note the strong ordering of the timeline also provides consistent
10561064
* nesting rules for the fence->lock; the inner lock is always the
10571065
* older lock.
10581066
*/
10591067
spin_lock_irqsave(fence->lock, flags);
1060-
prev = xchg(__active_fence_slot(active), fence);
1061-
if (prev) {
1062-
GEM_BUG_ON(prev == fence);
1068+
if (prev)
10631069
spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
1070+
1071+
/*
1072+
* A does the cmpxchg first, and so it sees C or NULL, as before, or
1073+
* something else, depending on the timing of other threads and/or
1074+
* interrupt handler. If not the same as before then A unlocks C if
1075+
* applicable and retries, starting from an attempt to get a new
1076+
* active->fence. Meanwhile, B follows the same path as A.
1077+
* Once A succeeds with cmpxch, B fails again, retires, gets A from
1078+
* active->fence, locks it as soon as A completes, and possibly
1079+
* succeeds with cmpxchg.
1080+
*/
1081+
while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) {
1082+
if (prev) {
1083+
spin_unlock(prev->lock);
1084+
dma_fence_put(prev);
1085+
}
1086+
spin_unlock_irqrestore(fence->lock, flags);
1087+
1088+
prev = i915_active_fence_get(active);
1089+
GEM_BUG_ON(prev == fence);
1090+
1091+
spin_lock_irqsave(fence->lock, flags);
1092+
if (prev)
1093+
spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
1094+
}
1095+
1096+
/*
1097+
* If prev is NULL then the previous fence must have been signaled
1098+
* and we know that we are first on the timeline. If it is still
1099+
* present then, having the lock on that fence already acquired, we
1100+
* serialise with the interrupt handler, in the process of removing it
1101+
* from any future interrupt callback. A will then wait on C before
1102+
* executing (if present).
1103+
*
1104+
* As B is second, it sees A as the previous fence and so waits for
1105+
* it to complete its transition and takes over the occupancy for
1106+
* itself -- remembering that it needs to wait on A before executing.
1107+
*/
1108+
if (prev) {
10641109
__list_del_entry(&active->cb.node);
10651110
spin_unlock(prev->lock); /* serialise with prev->cb_list */
10661111
}
@@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_active_fence *active,
10771122
int err = 0;
10781123

10791124
/* Must maintain timeline ordering wrt previous active requests */
1080-
rcu_read_lock();
10811125
fence = __i915_active_fence_set(active, &rq->fence);
1082-
if (fence) /* but the previous fence may not belong to that timeline! */
1083-
fence = dma_fence_get_rcu(fence);
1084-
rcu_read_unlock();
10851126
if (fence) {
10861127
err = i915_request_await_dma_fence(rq, fence);
10871128
dma_fence_put(fence);

drivers/gpu/drm/i915/i915_request.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1661,6 +1661,11 @@ __i915_request_ensure_parallel_ordering(struct i915_request *rq,
16611661

16621662
request_to_parent(rq)->parallel.last_rq = i915_request_get(rq);
16631663

1664+
/*
1665+
* Users have to put a reference potentially got by
1666+
* __i915_active_fence_set() to the returned request
1667+
* when no longer needed
1668+
*/
16641669
return to_request(__i915_active_fence_set(&timeline->last_request,
16651670
&rq->fence));
16661671
}
@@ -1707,6 +1712,10 @@ __i915_request_ensure_ordering(struct i915_request *rq,
17071712
0);
17081713
}
17091714

1715+
/*
1716+
* Users have to put the reference to prev potentially got
1717+
* by __i915_active_fence_set() when no longer needed
1718+
*/
17101719
return prev;
17111720
}
17121721

@@ -1760,6 +1769,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
17601769
prev = __i915_request_ensure_ordering(rq, timeline);
17611770
else
17621771
prev = __i915_request_ensure_parallel_ordering(rq, timeline);
1772+
if (prev)
1773+
i915_request_put(prev);
17631774

17641775
/*
17651776
* Make sure that no request gazumped us - if it was allocated after

0 commit comments

Comments
 (0)