Skip to content
This repository was archived by the owner on Nov 8, 2023. It is now read-only.

Commit 0e45882

Browse files
jkrzyszt-intelrodrigovivi
authored andcommitted
drm/i915/vma: Fix UAF on destroy against retire race
Object debugging tools were sporadically reporting illegal attempts to free a still active i915 VMA object when parking a GT believed to be idle. [161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915] [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0 ... [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1 [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022 [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915] [161.360592] RIP: 0010:debug_print_object+0x80/0xb0 ... [161.361347] debug_object_free+0xeb/0x110 [161.361362] i915_active_fini+0x14/0x130 [i915] [161.361866] release_references+0xfe/0x1f0 [i915] [161.362543] i915_vma_parked+0x1db/0x380 [i915] [161.363129] __gt_park+0x121/0x230 [i915] [161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915] That has been tracked down to be happening when another thread is deactivating the VMA inside __active_retire() helper, after the VMA's active counter has been already decremented to 0, but before deactivation of the VMA's object is reported to the object debugging tool. We could prevent from that race by serializing i915_active_fini() with __active_retire() via ref->tree_lock, but that wouldn't stop the VMA from being used, e.g. from __i915_vma_retire() called at the end of __active_retire(), after that VMA has been already freed by a concurrent i915_vma_destroy() on return from the i915_active_fini(). Then, we should rather fix the issue at the VMA level, not in i915_active. Since __i915_vma_parked() is called from __gt_park() on last put of the GT's wakeref, the issue could be addressed by holding the GT wakeref long enough for __active_retire() to complete before that wakeref is released and the GT parked. I believe the issue was introduced by commit d939397 ("drm/i915: Remove the vma refcount") which moved a call to i915_active_fini() from a dropped i915_vma_release(), called on last put of the removed VMA kref, to i915_vma_parked() processing path called on last put of a GT wakeref. However, its visibility to the object debugging tool was suppressed by a bug in i915_active that was fixed two weeks later with commit e92eb24 ("drm/i915/active: Fix missing debug object activation"). A VMA associated with a request doesn't acquire a GT wakeref by itself. Instead, it depends on a wakeref held directly by the request's active intel_context for a GT associated with its VM, and indirectly on that intel_context's engine wakeref if the engine belongs to the same GT as the VMA's VM. Those wakerefs are released asynchronously to VMA deactivation. Fix the issue by getting a wakeref for the VMA's GT when activating it, and putting that wakeref only after the VMA is deactivated. However, exclude global GTT from that processing path, otherwise the GPU never goes idle. Since __i915_vma_retire() may be called from atomic contexts, use async variant of wakeref put. Also, to avoid circular locking dependency, take care of acquiring the wakeref before VM mutex when both are needed. v7: Add inline comments with justifications for: - using untracked variants of intel_gt_pm_get/put() (Nirmoy), - using async variant of _put(), - not getting the wakeref in case of a global GTT, - always getting the first wakeref outside vm->mutex. v6: Since __i915_vma_active/retire() callbacks are not serialized, storing a wakeref tracking handle inside struct i915_vma is not safe, and there is no other good place for that. Use untracked variants of intel_gt_pm_get/put_async(). v5: Replace "tile" with "GT" across commit description (Rodrigo), - avoid mentioning multi-GT case in commit description (Rodrigo), - explain why we need to take a temporary wakeref unconditionally inside i915_vma_pin_ww() (Rodrigo). v4: Refresh on top of commit 5e4e06e ("drm/i915: Track gt pm wakerefs") (Andi), - for more easy backporting, split out removal of former insufficient workarounds and move them to separate patches (Nirmoy). - clean up commit message and description a bit. v3: Identify root cause more precisely, and a commit to blame, - identify and drop former workarounds, - update commit message and description. v2: Get the wakeref before VM mutex to avoid circular locking dependency, - drop questionable Fixes: tag. Fixes: d939397 ("drm/i915: Remove the vma refcount") Closes: https://gitlab.freedesktop.org/drm/intel/issues/8875 Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: stable@vger.kernel.org # v5.19+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240305143747.335367-6-janusz.krzysztofik@linux.intel.com (cherry picked from commit f3c71b2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1 parent d392e1b commit 0e45882

File tree

1 file changed

+43
-7
lines changed

1 file changed

+43
-7
lines changed

drivers/gpu/drm/i915/i915_vma.c

Lines changed: 43 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
#include "gt/intel_engine.h"
3535
#include "gt/intel_engine_heartbeat.h"
3636
#include "gt/intel_gt.h"
37+
#include "gt/intel_gt_pm.h"
3738
#include "gt/intel_gt_requests.h"
3839
#include "gt/intel_tlb.h"
3940

@@ -103,12 +104,42 @@ static inline struct i915_vma *active_to_vma(struct i915_active *ref)
103104

104105
static int __i915_vma_active(struct i915_active *ref)
105106
{
106-
return i915_vma_tryget(active_to_vma(ref)) ? 0 : -ENOENT;
107+
struct i915_vma *vma = active_to_vma(ref);
108+
109+
if (!i915_vma_tryget(vma))
110+
return -ENOENT;
111+
112+
/*
113+
* Exclude global GTT VMA from holding a GT wakeref
114+
* while active, otherwise GPU never goes idle.
115+
*/
116+
if (!i915_vma_is_ggtt(vma)) {
117+
/*
118+
* Since we and our _retire() counterpart can be
119+
* called asynchronously, storing a wakeref tracking
120+
* handle inside struct i915_vma is not safe, and
121+
* there is no other good place for that. Hence,
122+
* use untracked variants of intel_gt_pm_get/put().
123+
*/
124+
intel_gt_pm_get_untracked(vma->vm->gt);
125+
}
126+
127+
return 0;
107128
}
108129

109130
static void __i915_vma_retire(struct i915_active *ref)
110131
{
111-
i915_vma_put(active_to_vma(ref));
132+
struct i915_vma *vma = active_to_vma(ref);
133+
134+
if (!i915_vma_is_ggtt(vma)) {
135+
/*
136+
* Since we can be called from atomic contexts,
137+
* use an async variant of intel_gt_pm_put().
138+
*/
139+
intel_gt_pm_put_async_untracked(vma->vm->gt);
140+
}
141+
142+
i915_vma_put(vma);
112143
}
113144

114145
static struct i915_vma *
@@ -1404,7 +1435,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
14041435
struct i915_vma_work *work = NULL;
14051436
struct dma_fence *moving = NULL;
14061437
struct i915_vma_resource *vma_res = NULL;
1407-
intel_wakeref_t wakeref = 0;
1438+
intel_wakeref_t wakeref;
14081439
unsigned int bound;
14091440
int err;
14101441

@@ -1424,8 +1455,14 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
14241455
if (err)
14251456
return err;
14261457

1427-
if (flags & PIN_GLOBAL)
1428-
wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
1458+
/*
1459+
* In case of a global GTT, we must hold a runtime-pm wakeref
1460+
* while global PTEs are updated. In other cases, we hold
1461+
* the rpm reference while the VMA is active. Since runtime
1462+
* resume may require allocations, which are forbidden inside
1463+
* vm->mutex, get the first rpm wakeref outside of the mutex.
1464+
*/
1465+
wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
14291466

14301467
if (flags & vma->vm->bind_async_flags) {
14311468
/* lock VM */
@@ -1561,8 +1598,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
15611598
if (work)
15621599
dma_fence_work_commit_imm(&work->base);
15631600
err_rpm:
1564-
if (wakeref)
1565-
intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
1601+
intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
15661602

15671603
if (moving)
15681604
dma_fence_put(moving);

0 commit comments

Comments
 (0)