Skip to content

Commit b0327bb

Browse files
yanzhao56bonzini
authored andcommitted
KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY
Retry locally in the TDX EPT violation handler for private memory to reduce the chances for tdh_mem_sept_add()/tdh_mem_page_aug() to contend with tdh_vp_enter(). TDX EPT violation installs private pages via tdh_mem_sept_add() and tdh_mem_page_aug(). The two may have contention with tdh_vp_enter() or TDCALLs. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------ SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation) tdh_mem_page_aug ------------------------------------------------------------ SEPT entry tdh_mem_sept_add (Host lock) tdh_mem_page_aug (Host lock) tdg_mem_page_accept (Guest lock) tdg_mem_page_attr_rd (Guest lock) tdg_mem_page_attr_wr (Guest lock) Though the contention between tdh_mem_sept_add()/tdh_mem_page_aug() and TDCALLs may be removed in future TDX module, their contention with tdh_vp_enter() due to 0-step mitigation still persists. The TDX module may trigger 0-step mitigation in SEAMCALL TDH.VP.ENTER, which works as follows: 0. Each TDH.VP.ENTER records the guest RIP on TD entry. 1. When the TDX module encounters a VM exit with reason EPT_VIOLATION, it checks if the guest RIP is the same as last guest RIP on TD entry. -if yes, it means the EPT violation is caused by the same instruction that caused the last VM exit. Then, the TDX module increases the guest RIP no-progress count. When the count increases from 0 to the threshold (currently 6), the TDX module records the faulting GPA into a last_epf_gpa_list. -if no, it means the guest RIP has made progress. So, the TDX module resets the RIP no-progress count and the last_epf_gpa_list. 2. On the next TDH.VP.ENTER, the TDX module (after saving the guest RIP on TD entry) checks if the last_epf_gpa_list is empty. -if yes, TD entry continues without acquiring the lock on the SEPT tree. -if no, it triggers the 0-step mitigation by acquiring the exclusive lock on SEPT tree, walking the EPT tree to check if all page faults caused by the GPAs in the last_epf_gpa_list have been resolved before continuing TD entry. Since KVM TDP MMU usually re-enters guest whenever it exits to userspace (e.g. for KVM_EXIT_MEMORY_FAULT) or encounters a BUSY, it is possible for a tdh_vp_enter() to be called more than the threshold count before a page fault is addressed, triggering contention when tdh_vp_enter() attempts to acquire exclusive lock on SEPT tree. Retry locally in TDX EPT violation handler to reduce the count of invoking tdh_vp_enter(), hence reducing the possibility of its contention with tdh_mem_sept_add()/tdh_mem_page_aug(). However, the 0-step mitigation and the contention are still not eliminated due to KVM_EXIT_MEMORY_FAULT, signals/interrupts, and cases when one instruction faults more GFNs than the threshold count. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20250227012021.1778144-4-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 parent e6a8578 commit b0327bb

File tree

1 file changed

+56
-1
lines changed

1 file changed

+56
-1
lines changed

arch/x86/kvm/vmx/tdx.c

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1724,6 +1724,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
17241724
{
17251725
unsigned long exit_qual;
17261726
gpa_t gpa = to_tdx(vcpu)->exit_gpa;
1727+
bool local_retry = false;
1728+
int ret;
17271729

17281730
if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
17291731
if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
@@ -1742,6 +1744,9 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
17421744
* due to aliasing a single HPA to multiple GPAs.
17431745
*/
17441746
exit_qual = EPT_VIOLATION_ACC_WRITE;
1747+
1748+
/* Only private GPA triggers zero-step mitigation */
1749+
local_retry = true;
17451750
} else {
17461751
exit_qual = vmx_get_exit_qual(vcpu);
17471752
/*
@@ -1754,7 +1759,57 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
17541759
}
17551760

17561761
trace_kvm_page_fault(vcpu, gpa, exit_qual);
1757-
return __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
1762+
1763+
/*
1764+
* To minimize TDH.VP.ENTER invocations, retry locally for private GPA
1765+
* mapping in TDX.
1766+
*
1767+
* KVM may return RET_PF_RETRY for private GPA due to
1768+
* - contentions when atomically updating SPTEs of the mirror page table
1769+
* - in-progress GFN invalidation or memslot removal.
1770+
* - TDX_OPERAND_BUSY error from TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD,
1771+
* caused by contentions with TDH.VP.ENTER (with zero-step mitigation)
1772+
* or certain TDCALLs.
1773+
*
1774+
* If TDH.VP.ENTER is invoked more times than the threshold set by the
1775+
* TDX module before KVM resolves the private GPA mapping, the TDX
1776+
* module will activate zero-step mitigation during TDH.VP.ENTER. This
1777+
* process acquires an SEPT tree lock in the TDX module, leading to
1778+
* further contentions with TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD
1779+
* operations on other vCPUs.
1780+
*
1781+
* Breaking out of local retries for kvm_vcpu_has_events() is for
1782+
* interrupt injection. kvm_vcpu_has_events() should not see pending
1783+
* events for TDX. Since KVM can't determine if IRQs (or NMIs) are
1784+
* blocked by TDs, false positives are inevitable i.e., KVM may re-enter
1785+
* the guest even if the IRQ/NMI can't be delivered.
1786+
*
1787+
* Note: even without breaking out of local retries, zero-step
1788+
* mitigation may still occur due to
1789+
* - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT,
1790+
* - a single RIP causing EPT violations for more GFNs than the
1791+
* threshold count.
1792+
* This is safe, as triggering zero-step mitigation only introduces
1793+
* contentions to page installation SEAMCALLs on other vCPUs, which will
1794+
* handle retries locally in their EPT violation handlers.
1795+
*/
1796+
while (1) {
1797+
ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
1798+
1799+
if (ret != RET_PF_RETRY || !local_retry)
1800+
break;
1801+
1802+
if (kvm_vcpu_has_events(vcpu) || signal_pending(current))
1803+
break;
1804+
1805+
if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
1806+
ret = -EIO;
1807+
break;
1808+
}
1809+
1810+
cond_resched();
1811+
}
1812+
return ret;
17581813
}
17591814

17601815
int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)

0 commit comments

Comments
 (0)