Skip to content

Commit 8ca9836

Browse files
committed
KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity
Zap invalidated TDP MMU roots at maximum granularity, i.e. with more frequent conditional resched checkpoints, in order to avoid running for an extended duration (milliseconds, or worse) without honoring a reschedule request. And for kernels running with full or real-time preempt models, zapping at 4KiB granularity also provides significantly reduced latency for other tasks that are contending for mmu_lock (which isn't necessarily an overall win for KVM, but KVM should do its best to honor the kernel's preemption model). To keep KVM's assertion that zapping at 1GiB granularity is functionally ok, which is the main reason 1GiB was selected in the past, skip straight to zapping at 1GiB if KVM is configured to prove the MMU. Zapping roots is far more common than a vCPU replacing a 1GiB page table with a hugepage, e.g. generally happens multiple times during boot, and so keeping the test coverage provided by root zaps is desirable, just not for production. Cc: David Matlack <dmatlack@google.com> Cc: Pattara Teerapong <pteerapong@google.com> Link: https://lore.kernel.org/r/20240111020048.844847-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent dfeef3d commit 8ca9836

File tree

1 file changed

+18
-7
lines changed

1 file changed

+18
-7
lines changed

arch/x86/kvm/mmu/tdp_mmu.c

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -734,15 +734,26 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
734734
rcu_read_lock();
735735

736736
/*
737-
* To avoid RCU stalls due to recursively removing huge swaths of SPs,
738-
* split the zap into two passes. On the first pass, zap at the 1gb
739-
* level, and then zap top-level SPs on the second pass. "1gb" is not
740-
* arbitrary, as KVM must be able to zap a 1gb shadow page without
741-
* inducing a stall to allow in-place replacement with a 1gb hugepage.
737+
* Zap roots in multiple passes of decreasing granularity, i.e. zap at
738+
* 4KiB=>2MiB=>1GiB=>root, in order to better honor need_resched() (all
739+
* preempt models) or mmu_lock contention (full or real-time models).
740+
* Zapping at finer granularity marginally increases the total time of
741+
* the zap, but in most cases the zap itself isn't latency sensitive.
742742
*
743-
* Because zapping a SP recurses on its children, stepping down to
744-
* PG_LEVEL_4K in the iterator itself is unnecessary.
743+
* If KVM is configured to prove the MMU, skip the 4KiB and 2MiB zaps
744+
* in order to mimic the page fault path, which can replace a 1GiB page
745+
* table with an equivalent 1GiB hugepage, i.e. can get saddled with
746+
* zapping a 1GiB region that's fully populated with 4KiB SPTEs. This
747+
* allows verifying that KVM can safely zap 1GiB regions, e.g. without
748+
* inducing RCU stalls, without relying on a relatively rare event
749+
* (zapping roots is orders of magnitude more common). Note, because
750+
* zapping a SP recurses on its children, stepping down to PG_LEVEL_4K
751+
* in the iterator itself is unnecessary.
745752
*/
753+
if (!IS_ENABLED(CONFIG_KVM_PROVE_MMU)) {
754+
__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_4K);
755+
__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_2M);
756+
}
746757
__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G);
747758
__tdp_mmu_zap_root(kvm, root, shared, root->role.level);
748759

0 commit comments

Comments
 (0)