Skip to content

Commit 9913212

Browse files
committed
Merge branch 'kvm-tdx-interrupts' into HEAD
Introduces support for interrupt handling for TDX guests, including virtual interrupt injection and VM-Exits caused by vectored events. Injection ========= TDX supports non-NMI interrupt injection only by posted interrupt. Posted interrupt descriptors (PIDs) are allocated in shared memory, KVM can update them directly. To post pending interrupts in the PID, KVM can generate a self-IPI with notification vector prior to TD entry. TDX guest status is protected, KVM can't get the interrupt status of TDX guest. For now, assume the interrupt is always allowed. A later patch set will let TDX guests to call TDVMCALL with HLT, which passes the interrupt block flag, so that whether interrupt is allowed in HLT will checked against the interrupt block flag. For NMIs, KVM can request the TDX module to inject a NMI into a TDX vCPU by setting the PEND_NMI TDVPS field to 1. Following that, KVM can call TDH.VP.ENTER to run the vCPU and the TDX module will attempt to inject the NMI as soon as possible. PEND_NMI TDVPS field is a 1-bit filed, i.e. KVM can only pend one NMI in the TDX module. Also, TDX doesn't allow KVM to request NMI-window exit directly. When there is already one NMI pending in the TDX module, i.e. it has not been delivered to TDX guest yet, if there is NMI pending in KVM, collapse the pending NMI in KVM into the one pending in the TDX module. Such collapse is OK considering on X86 bare metal, multiple NMIs could collapse into one NMI, e.g. when NMI is blocked by SMI. It's OS's responsibility to poll all NMI sources in the NMI handler to avoid missing handling of some NMI events. More details can be found in the changelog of the patch "KVM: TDX: Implement methods to inject NMI". TDX doesn't support system-management mode (SMM) and system-management interrupt (SMI) in guest TDs because TDX module doesn't provide a way for VMM to inject SMI into guest TD or switch guest vCPU mode into SMM. SMI requests return -ENOTTY similar to CONFIG_KVM_SMM=n. Likewise, INIT and SIPI events are not used and are blocked for TDX guests; TDX defines its own vCPU creation and initialization sequence, which is done on the host via SEAMCALLs at TD build time. VM-exit for external events =========================== Similar to the VMX case, external interrupts are with interrupts off: in the .handle_exit_irqoff() callback for external interrupts and in the noinstr region for NMIs. Just like VMX, NMI remains blocked after exiting from TDX guest for NMI-induced exits. Machine check, which is handled in the .handle_exit_irqoff() callback, is the only exception type KVM handles for TDX guests. For other exceptions, because TDX guest state is protected, exceptions in TDX guests can't be intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs. Host SMIs also cause an exit to KVM. This is needed because in SEAM root mode (TDX module) all interrupts are blocked. An SMI can be "I/O SMI" or "other SMI". For TDX, there will be no I/O SMI because I/O instructions inside TDX guest trigger #VE and TDX guest needs to use TDVMCALL to request VMM to do I/O emulation. The only case of interest for "other SMI" is an #MC occurring in the guest when MCE-SMI morphing is enabled in the host firmware. Such "MSMI" is marked by having bit 0 set in the exit qualification; MSMI exits are fatal for the TD and are eventually handled by the kernel machine check handler (7911f14 x86/mce: Implement recovery for errors in TDX/SEAM non-root mode), which marks the page as poisoned. It is not possible right now to pass machine check exceptions to the guest. SMIs other than machine check SMIs are handled just by leaving SEAM root mode and KVM doesn't need to do anything.
2 parents 4d2dc9a + 6c441e4 commit 9913212

File tree

19 files changed

+522
-120
lines changed

19 files changed

+522
-120
lines changed

arch/x86/include/asm/kvm-x86-ops.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ KVM_X86_OP_OPTIONAL(pi_start_assignment)
116116
KVM_X86_OP_OPTIONAL(apicv_pre_state_restore)
117117
KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
118118
KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
119+
KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
119120
KVM_X86_OP_OPTIONAL(set_hv_timer)
120121
KVM_X86_OP_OPTIONAL(cancel_hv_timer)
121122
KVM_X86_OP(setup_mce)

arch/x86/include/asm/kvm_host.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1842,6 +1842,7 @@ struct kvm_x86_ops {
18421842
void (*apicv_pre_state_restore)(struct kvm_vcpu *vcpu);
18431843
void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
18441844
bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
1845+
bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);
18451846

18461847
int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
18471848
bool *expired);

arch/x86/include/asm/posted_intr.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,11 @@ static inline bool pi_test_sn(struct pi_desc *pi_desc)
8181
return test_bit(POSTED_INTR_SN, (unsigned long *)&pi_desc->control);
8282
}
8383

84+
static inline bool pi_test_pir(int vector, struct pi_desc *pi_desc)
85+
{
86+
return test_bit(vector, (unsigned long *)pi_desc->pir);
87+
}
88+
8489
/* Non-atomic helpers */
8590
static inline void __pi_set_sn(struct pi_desc *pi_desc)
8691
{

arch/x86/include/uapi/asm/vmx.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
#define EXIT_REASON_TRIPLE_FAULT 2
3535
#define EXIT_REASON_INIT_SIGNAL 3
3636
#define EXIT_REASON_SIPI_SIGNAL 4
37+
#define EXIT_REASON_OTHER_SMI 6
3738

3839
#define EXIT_REASON_INTERRUPT_WINDOW 7
3940
#define EXIT_REASON_NMI_WINDOW 8

arch/x86/kvm/irq.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
100100
if (kvm_cpu_has_extint(v))
101101
return 1;
102102

103+
if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
104+
return kvm_x86_call(protected_apic_has_interrupt)(v);
105+
103106
return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
104107
}
105108
EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);

arch/x86/kvm/lapic.c

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1797,8 +1797,17 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
17971797
static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
17981798
{
17991799
struct kvm_lapic *apic = vcpu->arch.apic;
1800-
u32 reg = kvm_lapic_get_reg(apic, APIC_LVTT);
1800+
u32 reg;
18011801

1802+
/*
1803+
* Assume a timer IRQ was "injected" if the APIC is protected. KVM's
1804+
* copy of the vIRR is bogus, it's the responsibility of the caller to
1805+
* precisely check whether or not a timer IRQ is pending.
1806+
*/
1807+
if (apic->guest_apic_protected)
1808+
return true;
1809+
1810+
reg = kvm_lapic_get_reg(apic, APIC_LVTT);
18021811
if (kvm_apic_hw_enabled(apic)) {
18031812
int vec = reg & APIC_VECTOR_MASK;
18041813
void *bitmap = apic->regs + APIC_ISR;
@@ -2967,6 +2976,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
29672976
if (!kvm_apic_present(vcpu))
29682977
return -1;
29692978

2979+
if (apic->guest_apic_protected)
2980+
return -1;
2981+
29702982
__apic_update_ppr(apic, &ppr);
29712983
return apic_has_interrupt_for_ppr(apic, ppr);
29722984
}

arch/x86/kvm/lapic.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@ struct kvm_lapic {
6565
bool sw_enabled;
6666
bool irr_pending;
6767
bool lvt0_in_nmi_mode;
68+
/* Select registers in the vAPIC cannot be read/written. */
69+
bool guest_apic_protected;
6870
/* Number of bits set in ISR. */
6971
s16 isr_count;
7072
/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */

arch/x86/kvm/smm.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,9 @@ union kvm_smram {
142142

143143
static inline int kvm_inject_smi(struct kvm_vcpu *vcpu)
144144
{
145+
if (!kvm_x86_call(has_emulated_msr)(vcpu->kvm, MSR_IA32_SMBASE))
146+
return -ENOTTY;
147+
145148
kvm_make_request(KVM_REQ_SMI, vcpu);
146149
return 0;
147150
}

arch/x86/kvm/vmx/common.h

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ struct vcpu_vt {
4848
* hardware.
4949
*/
5050
bool guest_state_loaded;
51+
bool emulation_required;
5152

5253
#ifdef CONFIG_X86_64
5354
u64 msr_host_kernel_gs_base;
@@ -109,4 +110,73 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
109110
return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
110111
}
111112

113+
static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
114+
int pi_vec)
115+
{
116+
#ifdef CONFIG_SMP
117+
if (vcpu->mode == IN_GUEST_MODE) {
118+
/*
119+
* The vector of the virtual has already been set in the PIR.
120+
* Send a notification event to deliver the virtual interrupt
121+
* unless the vCPU is the currently running vCPU, i.e. the
122+
* event is being sent from a fastpath VM-Exit handler, in
123+
* which case the PIR will be synced to the vIRR before
124+
* re-entering the guest.
125+
*
126+
* When the target is not the running vCPU, the following
127+
* possibilities emerge:
128+
*
129+
* Case 1: vCPU stays in non-root mode. Sending a notification
130+
* event posts the interrupt to the vCPU.
131+
*
132+
* Case 2: vCPU exits to root mode and is still runnable. The
133+
* PIR will be synced to the vIRR before re-entering the guest.
134+
* Sending a notification event is ok as the host IRQ handler
135+
* will ignore the spurious event.
136+
*
137+
* Case 3: vCPU exits to root mode and is blocked. vcpu_block()
138+
* has already synced PIR to vIRR and never blocks the vCPU if
139+
* the vIRR is not empty. Therefore, a blocked vCPU here does
140+
* not wait for any requested interrupts in PIR, and sending a
141+
* notification event also results in a benign, spurious event.
142+
*/
143+
144+
if (vcpu != kvm_get_running_vcpu())
145+
__apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
146+
return;
147+
}
148+
#endif
149+
/*
150+
* The vCPU isn't in the guest; wake the vCPU in case it is blocking,
151+
* otherwise do nothing as KVM will grab the highest priority pending
152+
* IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
153+
*/
154+
kvm_vcpu_wake_up(vcpu);
155+
}
156+
157+
/*
158+
* Post an interrupt to a vCPU's PIR and trigger the vCPU to process the
159+
* interrupt if necessary.
160+
*/
161+
static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
162+
struct pi_desc *pi_desc, int vector)
163+
{
164+
if (pi_test_and_set_pir(vector, pi_desc))
165+
return;
166+
167+
/* If a previous notification has sent the IPI, nothing to do. */
168+
if (pi_test_and_set_on(pi_desc))
169+
return;
170+
171+
/*
172+
* The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
173+
* after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
174+
* guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
175+
* posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
176+
*/
177+
kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
178+
}
179+
180+
noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu);
181+
112182
#endif /* __KVM_X86_VMX_COMMON_H */

0 commit comments

Comments
 (0)