Skip to content

Commit 4286a3e

Browse files
committed
Merge tag 'kvm-x86-mmu-6.15' of https://github.com/kvm-x86/linux into HEAD
KVM x86/mmu changes for 6.15 Add support for "fast" aging of SPTEs in both the TDP MMU and Shadow MMU, where "fast" means "without holding mmu_lock". Not taking mmu_lock allows multiple aging actions to run in parallel, and more importantly avoids stalling vCPUs, e.g. due to holding mmu_lock for an extended duration while a vCPU is faulting in memory. For the TDP MMU, protect aging via RCU; the page tables are RCU-protected and KVM doesn't need to access any metadata to age SPTEs. For the Shadow MMU, use bit 1 of rmap pointers (bit 0 is used to terminate a list of rmaps) to implement a per-rmap single-bit spinlock. When aging a gfn, acquire the rmap's spinlock with read-only permissions, which allows hardening and optimizing the locking and aging, e.g. locking an rmap for write requires mmu_lock to also be held. The lock is NOT a true R/W spinlock, i.e. multiple concurrent readers aren't supported. To avoid forcing all SPTE updates to use atomic operations (clearing the Accessed bit out of mmu_lock makes it inherently volatile), rework and rename spte_has_volatile_bits() to spte_needs_atomic_update() and deliberately exclude the Accessed bit. KVM (and mm/) already tolerates false positives/negatives for Accessed information, and all testing has shown that reducing the latency of aging is far more beneficial to overall system performance than providing "perfect" young/old information.
2 parents e335300 + 0dab791 commit 4286a3e

File tree

11 files changed

+375
-171
lines changed

11 files changed

+375
-171
lines changed

Documentation/virt/kvm/locking.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ writable between reading spte and updating spte. Like below case:
196196
The Dirty bit is lost in this case.
197197

198198
In order to avoid this kind of issue, we always treat the spte as "volatile"
199-
if it can be updated out of mmu-lock [see spte_has_volatile_bits()]; it means
199+
if it can be updated out of mmu-lock [see spte_needs_atomic_update()]; it means
200200
the spte is always atomically updated in this case.
201201

202202
3) flush tlbs due to spte updated
@@ -212,7 +212,7 @@ function to update spte (present -> present).
212212

213213
Since the spte is "volatile" if it can be updated out of mmu-lock, we always
214214
atomically update the spte and the race caused by fast page fault can be avoided.
215-
See the comments in spte_has_volatile_bits() and mmu_spte_update().
215+
See the comments in spte_needs_atomic_update() and mmu_spte_update().
216216

217217
Lockless Access Tracking:
218218

arch/x86/include/asm/kvm_host.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
#include <linux/kfifo.h>
2828
#include <linux/sched/vhost_task.h>
2929
#include <linux/call_once.h>
30+
#include <linux/atomic.h>
3031

3132
#include <asm/apic.h>
3233
#include <asm/pvclock-abi.h>
@@ -405,7 +406,7 @@ union kvm_cpu_role {
405406
};
406407

407408
struct kvm_rmap_head {
408-
unsigned long val;
409+
atomic_long_t val;
409410
};
410411

411412
struct kvm_pio_request {
@@ -1479,6 +1480,7 @@ struct kvm_arch {
14791480
* tdp_mmu_page set.
14801481
*
14811482
* For reads, this list is protected by:
1483+
* RCU alone or
14821484
* the MMU lock in read mode + RCU or
14831485
* the MMU lock in write mode
14841486
*

arch/x86/kvm/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ config KVM_X86
2222
select KVM_COMMON
2323
select KVM_GENERIC_MMU_NOTIFIER
2424
select KVM_ELIDE_TLB_FLUSH_IF_YOUNG
25+
select KVM_MMU_LOCKLESS_AGING
2526
select HAVE_KVM_IRQCHIP
2627
select HAVE_KVM_PFNCACHE
2728
select HAVE_KVM_DIRTY_RING_TSO

0 commit comments

Comments
 (0)