Skip to content

Commit fcbe348

Browse files
committed
Merge branch 'kvm-tdx-mmu' into HEAD
This series picks up from commit 86eb1ae ("Merge branch 'kvm-mirror-page-tables' into HEAD", 2025-01-20), which focused on changes to the generic x86 parts of the KVM MMU code, and adds support for TDX's secure page tables to the Intel side of KVM. Confidential computing solutions have concepts of private and shared memory. Often the guest accesses either private or shared memory via a bit in the guest PTE. Solutions like SEV treat this bit more like a permission bit, where solutions like TDX and ARM CCA treat it more like a GPA bit. In the latter case, the host maps private memory in one half of the address space and shared in another. For TDX these two halves are mapped by different EPT roots. The private half (also called Secure EPT in Intel documentation) gets managed by the privileged TDX Module. The shared half is managed by the untrusted part of the VMM (KVM). In addition to the separate roots for private and shared, there are limitations on what operations can be done on the private side. Like SNP, TDX wants to protect against protected memory being reset or otherwise scrambled by the host. In order to prevent this, the guest has to take specific action to “accept” memory after changes are made by the VMM to the private EPT. This prevents the VMM from performing many of the usual memory management operations that involve zapping and refaulting memory. The private memory also is always RWX and cannot have VMM specified cache attribute attributes applied. TDX memory implementation ========================= Creating shared EPT ------------------- Shared EPT handling is relatively simple compared to private memory. It is managed from within KVM. The main differences between shared EPT and EPT in a normal VM are that the root is set with a TDVMCS field (via SEAMCALL), and that the GFN specified in the memslot perspective needs to be mapped at an offset in the EPT. For the former, this series plumbs in the load_mmu_pgd() operation to the correct field for the shared EPT. For the latter, previous patches have laid the groundwork for mapping so called “direct roots” roots at an offset specified in kvm->arch.gfn_direct_bits. Creating private EPT -------------------- In previous patches, the concept of “mirrored roots” were introduced. Such roots maintain a KVM side “mirror” of the “external” EPT by keeping an unmapped EPT tree within the KVM MMU code. When changing these mirror EPTs, the KVM MMU code calls out via x86_ops to update the external EPT. This series adds implementations for these “external” ops for TDX to create and manage “private” memory via TDX module APIs. Managing S-EPT with the TDX Module ---------------------------------- The TDX module allows the TD’s private memory to be managed via SEAMCALLs. This management consists of operating on two internal elements: 1. The private EPT, which the TDX module calls the S-EPT. It maps the actual mapped, private half of the GPA space using an EPT tree. 2. The HKID, which represents private encryption keys used for encrypting TD memory. The CPU doesn’t guarantee cache coherency between these encryption keys, so memory that is encrypted with one of these keys needs to be reclaimed for use on the host in special ways. This series will primarily focus on the SEAMCALLs for managing the private EPT. Consideration of the HKID is needed for when the TD is torn down. Populating TDX Private memory ----------------------------- TDX allows the EPT mapping the TD's private memory to be modified in limited ways. There are SEAMCALLs for building and tearing down the EPT tree, as well as mapping pages into the private EPT. As for building and tearing down the EPT page tables, it is relatively simple. There are SEAMCALLs for installing and removing them. However, the current implementation only supports adding private EPT page tables, and leaves them installed for the lifetime of the TD. For teardown, the details are discussed in a later section. As for populating and zapping private SPTE, there are SEAMCALLs for this as well. The zapping case will be described in detail later. As for the populating case, there are two categories: before TD is finalized and after TD is finalized. Both of these scenarios go through the TDP MMU map path. The changes done previously to introduce “mirror” and “external” page tables handle directing SPTE installation operations through the set_external_spte() op. In the “after” case, the TDX set_external_spte() handler simply calls a SEAMCALL (TDX.MEM.PAGE.AUG). For the before case, it is a bit more complicated as it requires both setting the private SPTE *and* copying in the initial contents of the page at the same time. For TDX this is done via the KVM_TDX_INIT_MEM_REGION ioctl, which is effectively the kvm_gmem_populate() operation. For SNP, the private memory can be pre-populated first, and faulted in later like normal. But for TDX these need to both happen both at the same time and the setting of the private SPTE needs to happen in a different way than the “after” case described above. It needs to use the TDH.MEM.SEPT.ADD SEAMCALL which does both the copying in of the data and setting the SPTE. Without extensive modification to the fault path, it’s not possible utilize this callback from the set_external_spte() handler because it the source page for the data to be copied in is not known deep down in this callchain. So instead the post-populate callback does a three step process. 1. Pre-fault the memory into the mirror EPT, but have the set_external_spte() not make any SEAMCALLs. 2. Check that the page is still faulted into the mirror EPT under read mmu_lock that is held over this and the following step. 3. Call TDH.MEM.SEPT.ADD with the HPA of the page to copy data from, and the private page installed in the mirror EPT to use for the private mapping. The scheme involves some assumptions about the operations that might operate on the mirrored EPT before the VM is finalized. It assumes that no other memory will be faulted into the mirror EPT, that is not also added via TDH.MEM.SEPT.ADD). If this is violated the KVM MMU may not see private memory faulted in there later and so not make the proper external spte callbacks. To check this, KVM enforces that the number of pre-faulted pages is the same as the number of pages added via KVM_TDX_INIT_MEM_REGION. TDX TLB flushing ---------------- For TDX, TLB flushing needs to happen in different ways depending on whether private and/or shared EPT needs to be flushed. Shared EPT can be flushed like normal EPT with INVEPT. To avoid reading TD's EPTP out from TDX module, this series flushes shared EPT with type 2 INVEPT. Private TLB entries can be flushed this way too (via type 2). However, since the TDX module needs to enforce some guarantees around which private memory is mapped in the TD, it requires these operations to be done in special ways for private memory. For flushing private memory, two methods are possible. The simple one is the TDH.VP.FLUSH SEAMCALL; this flush is of the INVEPT type 1 variety (i.e. mappings associated with the TD). The second method is part of a sequence of SEAMCALLs for removing a guest page. The sequence looks like: 1. TDH.MEM.RANGE.BLOCK - Remove RWX bits from entry (similar to KVM’s zap). 2. TDH.MEM.TRACK - Increment the TD TLB epoch, which is a per-TD counter 3. Kick off all vCPUs - In order to force them to have to re-enter. 4. TDH.MEM.PAGE.REMOVE - Actually remove the page and make it available for other use. 5. TDH.VP.ENTER - On re-entering TDX module will see the epoch is incremented and flush the TLB. On top of this, during TDX module init TDH.SYS.LP.INIT (which is used to online a CPU for TDX usage) invokes INVEPT to flush all mappings in the TLB. During runtime, for normal (TDP MMU, non-nested) guests, KVM will do a TLB flushes in 4 scenarios: (1) kvm_mmu_load() After EPT is loaded, call kvm_x86_flush_tlb_current() to invalidate TLBs for current vCPU loaded EPT on current pCPU. (2) Loading vCPU to a new pCPU Send request KVM_REQ_TLB_FLUSH to current vCPU, the request handler will call kvm_x86_flush_tlb_all() to flush all EPTs assocated with the new pCPU. (3) When EPT mapping has changed (after removing or permission reduction) (e.g. in kvm_flush_remote_tlbs()) Send request KVM_REQ_TLB_FLUSH to all vCPUs by kicking all them off, the request handler on each vCPU will call kvm_x86_flush_tlb_all() to invalidate TLBs for all EPTs associated with the pCPU. (4) When EPT changes only affects current vCPU, e.g. virtual apic mode changed. Send request KVM_REQ_TLB_FLUSH_CURRENT, the request handler will call kvm_x86_flush_tlb_current() to invalidate TLBs for current vCPU loaded EPT on current pCPU. Only the first 3 are relevant to TDX. They are implemented as follows. (1) kvm_mmu_load() Only the shared EPT root is loaded in this path. The TDX module does not require any assurances about the operation, so the flush_tlb_current()->ept_sync_global() can be called as normal. (2) vCPU load When a vCPU migrates to a new logical processor, it has to be flushed on the *old* pCPU, unlike normal VMs where the INVEPT is executed on the new pCPU to remove stale mappings from previous usage of the same EPTP on the new pCPU. The TDX behavior comes from a requirement that a vCPU can only be associated with one pCPU at at time. This flush happens via an IPI that invokes TDH.VP.FLUSH SEAMCALL, during the vcpu_load callback. (3) Removing a private SPTE This is the more complicated flow. It is done in a simple way for now and is especially inefficient during VM teardown. The plan is to get a basic functional version working and optimize some of these flows later. When a private page mapping is removed, the core MMU code calls the newly remove_external_spte() op, and flushes the TLB on all vCPUs. But TDX can’t rely on doing that for private memory, so it has it’s own process for making sure the private page is removed. This flow (TDH.MEM.RANGE.BLOCK, TDH.MEM.TRACK, TDH.MEM.PAGE.REMOVE) is done withing the remove_external_spte() implementation as described in the “TDX TLB flushing” section above. After that, back in the core MMU code, KVM will call kvm_flush_remote_tlbs*() resulting in an INVEPT. Despite that, when the vCPUs re-enter (TDH.VP.ENTER) the TD, the TDX module will do another INVEPT for its own reassurance. Private memory teardown ----------------------- Tearing down private memory involves reclaiming three types of resources from the TDX module: 1. TD’s HKID To reclaim the TD’s HKID, no mappings may be mapped with it. 2. Private guest pages (mapped with HKID) 3. Private page tables that map private pages (mapped with HKID) From the TDX module’s perspective, to reclaim guest private pages they need to be prevented from be accessed via the HKID (unmapped and TLB flushed), their HKID associated cachelines need to be flushed, and they need to be marked as no longer use by the TD in the TDX modules internal tracking (PAMT) During runtime private PTEs can be zapped as part of memslot deletion or when memory coverts from shared to private, but private page tables and HKIDs are not torn down until the TD is being destructed. The means the operation to zap private guest mapped pages needs to do the required cache writeback under the assumption that other vCPU’s may be active, but the PTs do not. TD teardown resource reclamation -------------------------------- The code that does the TD teardown is organized such that when an HKID is reclaimed: 1. vCPUs will no longer enter the TD 2. The TLB is flushed on all CPUs 3. The HKID associated cachelines have been flushed. So at that point most of the steps needed to reclaim TD private pages and page tables have already been done and the reclaim operation only needs to update the TDX module’s tracking of page ownership. For simplicity each operation only supports one scenario: before or after HKID reclaim. Since zapping and reclaiming private pages has to function during runtime for memslot deletion and converting from shared to private, the TD teardown is arranged so this happens before HKID reclaim. Since private page tables are never torn down during TD runtime, they can happen in a simpler and more efficient way after HKID reclaim. The private page reclaim is initiated from the kvm fd release. The callchain looks like this: do_exit |->exit_mm --> tdx_mmu_release_hkid() was called here previously in v19 |->exit_files |->1.release vcpu fd |->2.kvm_gmem_release | |->kvm_gmem_invalidate_begin --> unmap all leaf entries, causing | zapping of private guest pages |->3.release kvmfd |->kvm_destroy_vm |->kvm_arch_pre_destroy_vm | | kvm_x86_call(vm_pre_destroy)(kvm) -->tdx_mmu_release_hkid() |->kvm_arch_destroy_vm |->kvm_unload_vcpu_mmus | kvm_destroy_vcpus(kvm) | |->kvm_arch_vcpu_destroy | |->kvm_x86_call(vcpu_free)(vcpu) | | kvm_mmu_destroy(vcpu) -->unref mirror root | kvm_mmu_uninit_vm(kvm) --> mirror root ref is 1 here, | zap private page tables | static_call_cond(kvm_x86_vm_destroy)(kvm);
2 parents 0d20742 + eac0b72 commit fcbe348

File tree

23 files changed

+1325
-85
lines changed

23 files changed

+1325
-85
lines changed

arch/x86/include/asm/kvm_host.h

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1562,6 +1562,13 @@ struct kvm_arch {
15621562
struct kvm_mmu_memory_cache split_desc_cache;
15631563

15641564
gfn_t gfn_direct_bits;
1565+
1566+
/*
1567+
* Size of the CPU's dirty log buffer, i.e. VMX's PML buffer. A Zero
1568+
* value indicates CPU dirty logging is unsupported or disabled in
1569+
* current VM.
1570+
*/
1571+
int cpu_dirty_log_size;
15651572
};
15661573

15671574
struct kvm_vm_stat {
@@ -1815,11 +1822,6 @@ struct kvm_x86_ops {
18151822
struct x86_exception *exception);
18161823
void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu);
18171824

1818-
/*
1819-
* Size of the CPU's dirty log buffer, i.e. VMX's PML buffer. A zero
1820-
* value indicates CPU dirty logging is unsupported or disabled.
1821-
*/
1822-
int cpu_dirty_log_size;
18231825
void (*update_cpu_dirty_logging)(struct kvm_vcpu *vcpu);
18241826

18251827
const struct kvm_x86_nested_ops *nested_ops;

arch/x86/include/asm/tdx.h

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,6 @@ struct tdx_vp {
148148
struct page **tdcx_pages;
149149
};
150150

151-
152151
static inline u64 mk_keyed_paddr(u16 hkid, struct page *page)
153152
{
154153
u64 ret;
@@ -158,15 +157,26 @@ static inline u64 mk_keyed_paddr(u16 hkid, struct page *page)
158157
ret |= (u64)hkid << boot_cpu_data.x86_phys_bits;
159158

160159
return ret;
160+
}
161161

162+
static inline int pg_level_to_tdx_sept_level(enum pg_level level)
163+
{
164+
WARN_ON_ONCE(level == PG_LEVEL_NONE);
165+
return level - 1;
162166
}
163167

164168
u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
169+
u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct page *source, u64 *ext_err1, u64 *ext_err2);
170+
u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2);
165171
u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page);
172+
u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2);
173+
u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_err1, u64 *ext_err2);
166174
u64 tdh_mng_key_config(struct tdx_td *td);
167175
u64 tdh_mng_create(struct tdx_td *td, u16 hkid);
168176
u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp);
169177
u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *data);
178+
u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2);
179+
u64 tdh_mr_finalize(struct tdx_td *td);
170180
u64 tdh_vp_flush(struct tdx_vp *vp);
171181
u64 tdh_mng_vpflushdone(struct tdx_td *td);
172182
u64 tdh_mng_key_freeid(struct tdx_td *td);
@@ -175,8 +185,11 @@ u64 tdh_vp_init(struct tdx_vp *vp, u64 initial_rcx, u32 x2apicid);
175185
u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data);
176186
u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask);
177187
u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner, u64 *tdx_size);
188+
u64 tdh_mem_track(struct tdx_td *tdr);
189+
u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u64 *ext_err2);
178190
u64 tdh_phymem_cache_wb(bool resume);
179191
u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td);
192+
u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page);
180193
#else
181194
static inline void tdx_init(void) { }
182195
static inline int tdx_cpu_enable(void) { return -ENODEV; }

arch/x86/include/asm/vmx.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ enum vmcs_field {
256256
TSC_MULTIPLIER_HIGH = 0x00002033,
257257
TERTIARY_VM_EXEC_CONTROL = 0x00002034,
258258
TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035,
259+
SHARED_EPT_POINTER = 0x0000203C,
259260
PID_POINTER_TABLE = 0x00002042,
260261
PID_POINTER_TABLE_HIGH = 0x00002043,
261262
GUEST_PHYSICAL_ADDRESS = 0x00002400,

arch/x86/include/uapi/asm/kvm.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -932,6 +932,8 @@ enum kvm_tdx_cmd_id {
932932
KVM_TDX_CAPABILITIES = 0,
933933
KVM_TDX_INIT_VM,
934934
KVM_TDX_INIT_VCPU,
935+
KVM_TDX_INIT_MEM_REGION,
936+
KVM_TDX_FINALIZE_VM,
935937
KVM_TDX_GET_CPUID,
936938

937939
KVM_TDX_CMD_NR_MAX,
@@ -987,4 +989,12 @@ struct kvm_tdx_init_vm {
987989
struct kvm_cpuid2 cpuid;
988990
};
989991

992+
#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0)
993+
994+
struct kvm_tdx_init_mem_region {
995+
__u64 source_addr;
996+
__u64 gpa;
997+
__u64 nr_pages;
998+
};
999+
9901000
#endif /* _ASM_X86_KVM_H */

arch/x86/kvm/mmu.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ static inline gfn_t kvm_mmu_max_gfn(void)
7979
u8 kvm_mmu_get_max_tdp_level(void);
8080

8181
void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
82+
void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
8283
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
8384
void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
8485

@@ -253,6 +254,9 @@ extern bool tdp_mmu_enabled;
253254
#define tdp_mmu_enabled false
254255
#endif
255256

257+
bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa);
258+
int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level);
259+
256260
static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
257261
{
258262
return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm);

arch/x86/kvm/mmu/mmu.c

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ static bool __ro_after_init tdp_mmu_allowed;
110110
#ifdef CONFIG_X86_64
111111
bool __read_mostly tdp_mmu_enabled = true;
112112
module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444);
113+
EXPORT_SYMBOL_GPL(tdp_mmu_enabled);
113114
#endif
114115

115116
static int max_huge_page_level __read_mostly;
@@ -1304,15 +1305,15 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
13041305
* enabled but it chooses between clearing the Dirty bit and Writeable
13051306
* bit based on the context.
13061307
*/
1307-
if (kvm_x86_ops.cpu_dirty_log_size)
1308+
if (kvm->arch.cpu_dirty_log_size)
13081309
kvm_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset, mask);
13091310
else
13101311
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
13111312
}
13121313

1313-
int kvm_cpu_dirty_log_size(void)
1314+
int kvm_cpu_dirty_log_size(struct kvm *kvm)
13141315
{
1315-
return kvm_x86_ops.cpu_dirty_log_size;
1316+
return kvm->arch.cpu_dirty_log_size;
13161317
}
13171318

13181319
bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
@@ -4685,8 +4686,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
46854686
return direct_page_fault(vcpu, fault);
46864687
}
46874688

4688-
static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
4689-
u8 *level)
4689+
int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level)
46904690
{
46914691
int r;
46924692

@@ -4700,6 +4700,10 @@ static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
47004700
do {
47014701
if (signal_pending(current))
47024702
return -EINTR;
4703+
4704+
if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu))
4705+
return -EIO;
4706+
47034707
cond_resched();
47044708
r = kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level);
47054709
} while (r == RET_PF_RETRY);
@@ -4724,6 +4728,7 @@ static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
47244728
return -EIO;
47254729
}
47264730
}
4731+
EXPORT_SYMBOL_GPL(kvm_tdp_map_page);
47274732

47284733
long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
47294734
struct kvm_pre_fault_memory *range)
@@ -5747,6 +5752,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
57475752
out:
57485753
return r;
57495754
}
5755+
EXPORT_SYMBOL_GPL(kvm_mmu_load);
57505756

57515757
void kvm_mmu_unload(struct kvm_vcpu *vcpu)
57525758
{
@@ -7072,6 +7078,7 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm,
70727078
.start = slot->base_gfn,
70737079
.end = slot->base_gfn + slot->npages,
70747080
.may_block = true,
7081+
.attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED,
70757082
};
70767083
bool flush;
70777084

arch/x86/kvm/mmu/mmu_internal.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,8 @@ static inline gfn_t kvm_gfn_root_bits(const struct kvm *kvm, const struct kvm_mm
187187
return kvm_gfn_direct_bits(kvm);
188188
}
189189

190-
static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
190+
static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm *kvm,
191+
struct kvm_mmu_page *sp)
191192
{
192193
/*
193194
* When using the EPT page-modification log, the GPAs in the CPU dirty
@@ -197,7 +198,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
197198
* being enabled is mandatory as the bits used to denote WP-only SPTEs
198199
* are reserved for PAE paging (32-bit KVM).
199200
*/
200-
return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
201+
return kvm->arch.cpu_dirty_log_size && sp->role.guest_mode;
201202
}
202203

203204
static inline gfn_t gfn_round_for_level(gfn_t gfn, int level)

arch/x86/kvm/mmu/page_track.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,9 @@ static int kvm_enable_external_write_tracking(struct kvm *kvm)
172172
struct kvm_memory_slot *slot;
173173
int r = 0, i, bkt;
174174

175+
if (kvm->arch.vm_type == KVM_X86_TDX_VM)
176+
return -EOPNOTSUPP;
177+
175178
mutex_lock(&kvm->slots_arch_lock);
176179

177180
/*

arch/x86/kvm/mmu/spte.c

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,6 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
9696
u64 spte = generation_mmio_spte_mask(gen);
9797
u64 gpa = gfn << PAGE_SHIFT;
9898

99-
WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value);
100-
10199
access &= shadow_mmio_access_mask;
102100
spte |= vcpu->kvm->arch.shadow_mmio_value | access;
103101
spte |= gpa | shadow_nonpresent_or_rsvd_mask;
@@ -170,7 +168,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
170168

171169
if (sp->role.ad_disabled)
172170
spte |= SPTE_TDP_AD_DISABLED;
173-
else if (kvm_mmu_page_ad_need_write_protect(sp))
171+
else if (kvm_mmu_page_ad_need_write_protect(vcpu->kvm, sp))
174172
spte |= SPTE_TDP_AD_WRPROT_ONLY;
175173

176174
spte |= shadow_present_mask;
@@ -433,6 +431,12 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
433431
}
434432
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
435433

434+
void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value)
435+
{
436+
kvm->arch.shadow_mmio_value = mmio_value;
437+
}
438+
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value);
439+
436440
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
437441
{
438442
/* shadow_me_value must be a subset of shadow_me_mask */

arch/x86/kvm/mmu/tdp_mmu.c

Lines changed: 38 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1613,21 +1613,21 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm,
16131613
}
16141614
}
16151615

1616-
static bool tdp_mmu_need_write_protect(struct kvm_mmu_page *sp)
1616+
static bool tdp_mmu_need_write_protect(struct kvm *kvm, struct kvm_mmu_page *sp)
16171617
{
16181618
/*
16191619
* All TDP MMU shadow pages share the same role as their root, aside
16201620
* from level, so it is valid to key off any shadow page to determine if
16211621
* write protection is needed for an entire tree.
16221622
*/
1623-
return kvm_mmu_page_ad_need_write_protect(sp) || !kvm_ad_enabled;
1623+
return kvm_mmu_page_ad_need_write_protect(kvm, sp) || !kvm_ad_enabled;
16241624
}
16251625

16261626
static void clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
16271627
gfn_t start, gfn_t end)
16281628
{
1629-
const u64 dbit = tdp_mmu_need_write_protect(root) ? PT_WRITABLE_MASK :
1630-
shadow_dirty_mask;
1629+
const u64 dbit = tdp_mmu_need_write_protect(kvm, root) ?
1630+
PT_WRITABLE_MASK : shadow_dirty_mask;
16311631
struct tdp_iter iter;
16321632

16331633
rcu_read_lock();
@@ -1672,8 +1672,8 @@ void kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
16721672
static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
16731673
gfn_t gfn, unsigned long mask, bool wrprot)
16741674
{
1675-
const u64 dbit = (wrprot || tdp_mmu_need_write_protect(root)) ? PT_WRITABLE_MASK :
1676-
shadow_dirty_mask;
1675+
const u64 dbit = (wrprot || tdp_mmu_need_write_protect(kvm, root)) ?
1676+
PT_WRITABLE_MASK : shadow_dirty_mask;
16771677
struct tdp_iter iter;
16781678

16791679
lockdep_assert_held_write(&kvm->mmu_lock);
@@ -1894,16 +1894,13 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
18941894
*
18951895
* Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}.
18961896
*/
1897-
int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
1898-
int *root_level)
1897+
static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
1898+
struct kvm_mmu_page *root)
18991899
{
1900-
struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa);
19011900
struct tdp_iter iter;
19021901
gfn_t gfn = addr >> PAGE_SHIFT;
19031902
int leaf = -1;
19041903

1905-
*root_level = vcpu->arch.mmu->root_role.level;
1906-
19071904
tdp_mmu_for_each_pte(iter, vcpu->kvm, root, gfn, gfn + 1) {
19081905
leaf = iter.level;
19091906
sptes[leaf] = iter.old_spte;
@@ -1912,6 +1909,36 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
19121909
return leaf;
19131910
}
19141911

1912+
int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
1913+
int *root_level)
1914+
{
1915+
struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa);
1916+
*root_level = vcpu->arch.mmu->root_role.level;
1917+
1918+
return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root);
1919+
}
1920+
1921+
bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa)
1922+
{
1923+
struct kvm *kvm = vcpu->kvm;
1924+
bool is_direct = kvm_is_addr_direct(kvm, gpa);
1925+
hpa_t root = is_direct ? vcpu->arch.mmu->root.hpa :
1926+
vcpu->arch.mmu->mirror_root_hpa;
1927+
u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte;
1928+
int leaf;
1929+
1930+
lockdep_assert_held(&kvm->mmu_lock);
1931+
rcu_read_lock();
1932+
leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root));
1933+
rcu_read_unlock();
1934+
if (leaf < 0)
1935+
return false;
1936+
1937+
spte = sptes[leaf];
1938+
return is_shadow_present_pte(spte) && is_last_spte(spte, leaf);
1939+
}
1940+
EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped);
1941+
19151942
/*
19161943
* Returns the last level spte pointer of the shadow page walk for the given
19171944
* gpa, and sets *spte to the spte value. This spte may be non-preset. If no

0 commit comments

Comments
 (0)