Skip to content

Commit 6addfcf

Browse files
dwmw2sean-jc
authored andcommitted
KVM: pfncache: simplify locking and make more self-contained
The locking on the gfn_to_pfn_cache is... interesting. And awful. There is a rwlock in ->lock which readers take to ensure protection against concurrent changes. But __kvm_gpc_refresh() makes assumptions that certain fields will not change even while it drops the write lock and performs MM operations to revalidate the target PFN and kernel mapping. Commit 93984f1 ("KVM: Fully serialize gfn=>pfn cache refresh via mutex") partly addressed that — not by fixing it, but by adding a new mutex, ->refresh_lock. This prevented concurrent __kvm_gpc_refresh() calls on a given gfn_to_pfn_cache, but is still only a partial solution. There is still a theoretical race where __kvm_gpc_refresh() runs in parallel with kvm_gpc_deactivate(). While __kvm_gpc_refresh() has dropped the write lock, kvm_gpc_deactivate() clears the ->active flag and unmaps ->khva. Then __kvm_gpc_refresh() determines that the previous ->pfn and ->khva are still valid, and reinstalls those values into the structure. This leaves the gfn_to_pfn_cache with the ->valid bit set, but ->active clear. And a ->khva which looks like a reasonable kernel address but is actually unmapped. All it takes is a subsequent reactivation to cause that ->khva to be dereferenced. This would theoretically cause an oops which would look something like this: [1724749.564994] BUG: unable to handle page fault for address: ffffaa3540ace0e0 [1724749.565039] RIP: 0010:__kvm_xen_has_interrupt+0x8b/0xb0 I say "theoretically" because theoretically, that oops that was seen in production cannot happen. The code which uses the gfn_to_pfn_cache is supposed to have its *own* locking, to further paper over the fact that the gfn_to_pfn_cache's own papering-over (->refresh_lock) of its own rwlock abuse is not sufficient. For the Xen vcpu_info that external lock is the vcpu->mutex, and for the shared info it's kvm->arch.xen.xen_lock. Those locks ought to protect the gfn_to_pfn_cache against concurrent deactivation vs. refresh in all but the cases where the vcpu or kvm object is being *destroyed*, in which case the subsequent reactivation should never happen. Theoretically. Nevertheless, this locking abuse is awful and should be fixed, even if no clear explanation can be found for how the oops happened. So expand the use of the ->refresh_lock mutex to ensure serialization of activate/deactivate vs. refresh and make the pfncache locking entirely self-sufficient. This means that a future commit can simplify the locking in the callers, such as the Xen emulation code which has an outstanding problem with recursive locking of kvm->arch.xen.xen_lock, which will no longer be necessary. The rwlock abuse described above is still not best practice, although it's harmless now that the ->refresh_lock is held for the entire duration while the offending code drops the write lock, does some other stuff, then takes the write lock again and assumes nothing changed. That can also be fixed^W cleaned up in a subsequent commit, but this commit is a simpler basis for the Xen deadlock fix mentioned above. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20240227115648.3104-5-dwmw2@infradead.org [sean: use guard(mutex) to fix a missed unlock] Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent 66e3cf7 commit 6addfcf

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

virt/kvm/pfncache.c

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -256,12 +256,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
256256
if (page_offset + len > PAGE_SIZE)
257257
return -EINVAL;
258258

259-
/*
260-
* If another task is refreshing the cache, wait for it to complete.
261-
* There is no guarantee that concurrent refreshes will see the same
262-
* gpa, memslots generation, etc..., so they must be fully serialized.
263-
*/
264-
mutex_lock(&gpc->refresh_lock);
259+
lockdep_assert_held(&gpc->refresh_lock);
265260

266261
write_lock_irq(&gpc->lock);
267262

@@ -347,8 +342,6 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
347342
out_unlock:
348343
write_unlock_irq(&gpc->lock);
349344

350-
mutex_unlock(&gpc->refresh_lock);
351-
352345
if (unmap_old)
353346
gpc_unmap(old_pfn, old_khva);
354347

@@ -357,13 +350,16 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
357350

358351
int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len)
359352
{
353+
unsigned long uhva;
354+
355+
guard(mutex)(&gpc->refresh_lock);
356+
360357
/*
361358
* If the GPA is valid then ignore the HVA, as a cache can be GPA-based
362359
* or HVA-based, not both. For GPA-based caches, the HVA will be
363360
* recomputed during refresh if necessary.
364361
*/
365-
unsigned long uhva = kvm_is_error_gpa(gpc->gpa) ? gpc->uhva :
366-
KVM_HVA_ERR_BAD;
362+
uhva = kvm_is_error_gpa(gpc->gpa) ? gpc->uhva : KVM_HVA_ERR_BAD;
367363

368364
return __kvm_gpc_refresh(gpc, gpc->gpa, uhva, len);
369365
}
@@ -377,13 +373,16 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm)
377373
gpc->pfn = KVM_PFN_ERR_FAULT;
378374
gpc->gpa = INVALID_GPA;
379375
gpc->uhva = KVM_HVA_ERR_BAD;
376+
gpc->active = gpc->valid = false;
380377
}
381378

382379
static int __kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long uhva,
383380
unsigned long len)
384381
{
385382
struct kvm *kvm = gpc->kvm;
386383

384+
guard(mutex)(&gpc->refresh_lock);
385+
387386
if (!gpc->active) {
388387
if (KVM_BUG_ON(gpc->valid, kvm))
389388
return -EIO;
@@ -420,6 +419,8 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
420419
kvm_pfn_t old_pfn;
421420
void *old_khva;
422421

422+
guard(mutex)(&gpc->refresh_lock);
423+
423424
if (gpc->active) {
424425
/*
425426
* Deactivate the cache before removing it from the list, KVM

0 commit comments

Comments
 (0)