Skip to content

Commit c9c1e20

Browse files
yanzhao56bonzini
authored andcommitted
KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_IGNORE_GUEST_PAT
Introduce an Intel specific quirk KVM_X86_QUIRK_IGNORE_GUEST_PAT to have KVM ignore guest PAT when this quirk is enabled. On AMD platforms, KVM always honors guest PAT. On Intel however there are two issues. First, KVM *cannot* honor guest PAT if CPU feature self-snoop is not supported. Second, UC access on certain Intel platforms can be very slow[1] and honoring guest PAT on those platforms may break some old guests that accidentally specify video RAM as UC. Those old guests may never expect the slowness since KVM always forces WB previously. See [2]. So, introduce a quirk that KVM can enable by default on all Intel platforms to avoid breaking old unmodifiable guests. Newer userspace can disable this quirk if it wishes KVM to honor guest PAT; disabling the quirk will fail if self-snoop is not supported, i.e. if KVM cannot obey the wish. The quirk is a no-op on AMD and also if any assigned devices have non-coherent DMA. This is not an issue, as KVM_X86_QUIRK_CD_NW_CLEARED is another example of a quirk that is sometimes automatically disabled. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/all/Ztl9NWCOupNfVaCA@yzhao56-desk.sh.intel.com # [1] Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com # [2] Message-ID: <20250224070946.31482-1-yan.y.zhao@intel.com> [Use supported_quirks/inapplicable_quirks to support both AMD and no-self-snoop cases, as well as to remove the shadow_memtype_mask check from kvm_mmu_may_ignore_guest_pat(). - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 parent bd7d536 commit c9c1e20

File tree

7 files changed

+73
-15
lines changed

7 files changed

+73
-15
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8160,6 +8160,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
81608160
and 0x489), as KVM does now allow them to
81618161
be set by userspace (KVM sets them based on
81628162
guest CPUID, for safety purposes).
8163+
8164+
KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
8165+
guest PAT and forces the effective memory
8166+
type to WB in EPT. The quirk is not available
8167+
on Intel platforms which are incapable of
8168+
safely honoring guest PAT (i.e., without CPU
8169+
self-snoop, KVM always ignores guest PAT and
8170+
forces effective memory type to WB). It is
8171+
also ignored on AMD platforms or, on Intel,
8172+
when a VM has non-coherent DMA devices
8173+
assigned; KVM always honors guest PAT in
8174+
such case. The quirk is needed to avoid
8175+
slowdowns on certain Intel Xeon platforms
8176+
(e.g. ICX, SPR) where self-snoop feature is
8177+
supported but UC is slow enough to cause
8178+
issues with some older guests that use
8179+
UC instead of WC to map the video RAM.
8180+
Userspace can disable the quirk to honor
8181+
guest PAT if it knows that there is no such
8182+
guest software, for example if it does not
8183+
expose a bochs graphics device (which is
8184+
known to have had a buggy driver).
81638185
=================================== ============================================
81648186

81658187
7.32 KVM_CAP_MAX_VCPU_ID

arch/x86/include/asm/kvm_host.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2420,10 +2420,12 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
24202420
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
24212421
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
24222422
KVM_X86_QUIRK_SLOT_ZAP_ALL | \
2423-
KVM_X86_QUIRK_STUFF_FEATURE_MSRS)
2423+
KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
2424+
KVM_X86_QUIRK_IGNORE_GUEST_PAT)
24242425

24252426
#define KVM_X86_CONDITIONAL_QUIRKS \
2426-
KVM_X86_QUIRK_CD_NW_CLEARED
2427+
(KVM_X86_QUIRK_CD_NW_CLEARED | \
2428+
KVM_X86_QUIRK_IGNORE_GUEST_PAT)
24272429

24282430
/*
24292431
* KVM previously used a u32 field in kvm_run to indicate the hypercall was

arch/x86/include/uapi/asm/kvm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,7 @@ struct kvm_sync_regs {
441441
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
442442
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
443443
#define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8)
444+
#define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9)
444445

445446
#define KVM_STATE_NESTED_FORMAT_VMX 0
446447
#define KVM_STATE_NESTED_FORMAT_SVM 1

arch/x86/kvm/mmu.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
232232
return -(u32)fault & errcode;
233233
}
234234

235-
bool kvm_mmu_may_ignore_guest_pat(void);
235+
bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm);
236236

237237
int kvm_mmu_post_init_vm(struct kvm *kvm);
238238
void kvm_mmu_pre_destroy_vm(struct kvm *kvm);

arch/x86/kvm/mmu/mmu.c

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4663,17 +4663,19 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
46634663
}
46644664
#endif
46654665

4666-
bool kvm_mmu_may_ignore_guest_pat(void)
4666+
bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm)
46674667
{
46684668
/*
46694669
* When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
46704670
* has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
46714671
* honor the memtype from the guest's PAT so that guest accesses to
46724672
* memory that is DMA'd aren't cached against the guest's wishes. As a
4673-
* result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
4674-
* KVM _always_ ignores guest PAT (when EPT is enabled).
4673+
* result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA.
4674+
* KVM _always_ ignores guest PAT, when EPT is enabled and when quirk
4675+
* KVM_X86_QUIRK_IGNORE_GUEST_PAT is enabled or the CPU lacks the
4676+
* ability to safely honor guest PAT.
46754677
*/
4676-
return shadow_memtype_mask;
4678+
return kvm_check_has_quirk(kvm, KVM_X86_QUIRK_IGNORE_GUEST_PAT);
46774679
}
46784680

46794681
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

arch/x86/kvm/vmx/vmx.c

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7595,6 +7595,17 @@ int vmx_vm_init(struct kvm *kvm)
75957595
return 0;
75967596
}
75977597

7598+
static inline bool vmx_ignore_guest_pat(struct kvm *kvm)
7599+
{
7600+
/*
7601+
* Non-coherent DMA devices need the guest to flush CPU properly.
7602+
* In that case it is not possible to map all guest RAM as WB, so
7603+
* always trust guest PAT.
7604+
*/
7605+
return !kvm_arch_has_noncoherent_dma(kvm) &&
7606+
kvm_check_has_quirk(kvm, KVM_X86_QUIRK_IGNORE_GUEST_PAT);
7607+
}
7608+
75987609
u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
75997610
{
76007611
/*
@@ -7604,13 +7615,8 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
76047615
if (is_mmio)
76057616
return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
76067617

7607-
/*
7608-
* Force WB and ignore guest PAT if the VM does NOT have a non-coherent
7609-
* device attached. Letting the guest control memory types on Intel
7610-
* CPUs may result in unexpected behavior, and so KVM's ABI is to trust
7611-
* the guest to behave only as a last resort.
7612-
*/
7613-
if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
7618+
/* Force WB if ignoring guest PAT */
7619+
if (vmx_ignore_guest_pat(vcpu->kvm))
76147620
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
76157621

76167622
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
@@ -8502,6 +8508,27 @@ __init int vmx_hardware_setup(void)
85028508

85038509
kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
85048510

8511+
/*
8512+
* On Intel CPUs that lack self-snoop feature, letting the guest control
8513+
* memory types may result in unexpected behavior. So always ignore guest
8514+
* PAT on those CPUs and map VM as writeback, not allowing userspace to
8515+
* disable the quirk.
8516+
*
8517+
* On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is
8518+
* supported, UC is slow enough to cause issues with some older guests (e.g.
8519+
* an old version of bochs driver uses ioremap() instead of ioremap_wc() to
8520+
* map the video RAM, causing wayland desktop to fail to get started
8521+
* correctly). To avoid breaking those older guests that rely on KVM to force
8522+
* memory type to WB, provide KVM_X86_QUIRK_IGNORE_GUEST_PAT to preserve the
8523+
* safer (for performance) default behavior.
8524+
*
8525+
* On top of this, non-coherent DMA devices need the guest to flush CPU
8526+
* caches properly. This also requires honoring guest PAT, and is forced
8527+
* independent of the quirk in vmx_ignore_guest_pat().
8528+
*/
8529+
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
8530+
kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT;
8531+
kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT;
85058532
return r;
85068533
}
85078534

arch/x86/kvm/x86.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9829,6 +9829,10 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
98299829
if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled)
98309830
kvm_caps.supported_vm_types |= BIT(KVM_X86_SW_PROTECTED_VM);
98319831

9832+
/* KVM always ignores guest PAT for shadow paging. */
9833+
if (!tdp_enabled)
9834+
kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT;
9835+
98329836
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
98339837
kvm_caps.supported_xss = 0;
98349838

@@ -13564,7 +13568,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm)
1356413568
* (or last) non-coherent device is (un)registered to so that new SPTEs
1356513569
* with the correct "ignore guest PAT" setting are created.
1356613570
*/
13567-
if (kvm_mmu_may_ignore_guest_pat())
13571+
if (kvm_mmu_may_ignore_guest_pat(kvm))
1356813572
kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
1356913573
}
1357013574

0 commit comments

Comments
 (0)