Skip to content

Commit 43db111

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit 7bcf724 ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...
2 parents 12e9b9e + e9f1703 commit 43db111

File tree

158 files changed

+13220
-1985
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

158 files changed

+13220
-1985
lines changed

Documentation/arch/arm64/silicon-errata.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ stable kernels.
5757
+----------------+-----------------+-----------------+-----------------------------+
5858
| Ampere | AmpereOne AC04 | AC04_CPU_10 | AMPERE_ERRATUM_AC03_CPU_38 |
5959
+----------------+-----------------+-----------------+-----------------------------+
60+
| Ampere | AmpereOne AC04 | AC04_CPU_23 | AMPERE_ERRATUM_AC04_CPU_23 |
61+
+----------------+-----------------+-----------------+-----------------------------+
6062
+----------------+-----------------+-----------------+-----------------------------+
6163
| ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 |
6264
+----------------+-----------------+-----------------+-----------------------------+

Documentation/virt/kvm/api.rst

Lines changed: 61 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1411,6 +1411,9 @@ the memory region are automatically reflected into the guest. For example, an
14111411
mmap() that affects the region will be made visible immediately. Another
14121412
example is madvise(MADV_DROP).
14131413

1414+
For TDX guest, deleting/moving memory region loses guest memory contents.
1415+
Read only region isn't supported. Only as-id 0 is supported.
1416+
14141417
Note: On arm64, a write generated by the page-table walker (to update
14151418
the Access and Dirty flags, for example) never results in a
14161419
KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
@@ -3460,7 +3463,8 @@ The initial values are defined as:
34603463
- FPSIMD/NEON registers: set to 0
34613464
- SVE registers: set to 0
34623465
- System registers: Reset to their architecturally defined
3463-
values as for a warm reset to EL1 (resp. SVC)
3466+
values as for a warm reset to EL1 (resp. SVC) or EL2 (in the
3467+
case of EL2 being enabled).
34643468

34653469
Note that because some registers reflect machine topology, all vcpus
34663470
should be created before this ioctl is invoked.
@@ -3527,6 +3531,17 @@ Possible features:
35273531
- the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can
35283532
no longer be written using KVM_SET_ONE_REG.
35293533

3534+
- KVM_ARM_VCPU_HAS_EL2: Enable Nested Virtualisation support,
3535+
booting the guest from EL2 instead of EL1.
3536+
Depends on KVM_CAP_ARM_EL2.
3537+
The VM is running with HCR_EL2.E2H being RES1 (VHE) unless
3538+
KVM_ARM_VCPU_HAS_EL2_E2H0 is also set.
3539+
3540+
- KVM_ARM_VCPU_HAS_EL2_E2H0: Restrict Nested Virtualisation
3541+
support to HCR_EL2.E2H being RES0 (non-VHE).
3542+
Depends on KVM_CAP_ARM_EL2_E2H0.
3543+
KVM_ARM_VCPU_HAS_EL2 must also be set.
3544+
35303545
4.83 KVM_ARM_PREFERRED_TARGET
35313546
-----------------------------
35323547

@@ -4768,17 +4783,19 @@ H_GET_CPU_CHARACTERISTICS hypercall.
47684783

47694784
:Capability: basic
47704785
:Architectures: x86
4771-
:Type: vm
4786+
:Type: vm ioctl, vcpu ioctl
47724787
:Parameters: an opaque platform specific structure (in/out)
47734788
:Returns: 0 on success; -1 on error
47744789

47754790
If the platform supports creating encrypted VMs then this ioctl can be used
47764791
for issuing platform-specific memory encryption commands to manage those
47774792
encrypted VMs.
47784793

4779-
Currently, this ioctl is used for issuing Secure Encrypted Virtualization
4780-
(SEV) commands on AMD Processors. The SEV commands are defined in
4781-
Documentation/virt/kvm/x86/amd-memory-encryption.rst.
4794+
Currently, this ioctl is used for issuing both Secure Encrypted Virtualization
4795+
(SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands
4796+
on Intel Processors. The detailed commands are defined in
4797+
Documentation/virt/kvm/x86/amd-memory-encryption.rst and
4798+
Documentation/virt/kvm/x86/intel-tdx.rst.
47824799

47834800
4.111 KVM_MEMORY_ENCRYPT_REG_REGION
47844801
-----------------------------------
@@ -6827,6 +6844,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
68276844
#define KVM_SYSTEM_EVENT_WAKEUP 4
68286845
#define KVM_SYSTEM_EVENT_SUSPEND 5
68296846
#define KVM_SYSTEM_EVENT_SEV_TERM 6
6847+
#define KVM_SYSTEM_EVENT_TDX_FATAL 7
68306848
__u32 type;
68316849
__u32 ndata;
68326850
__u64 data[16];
@@ -6853,6 +6871,11 @@ Valid values for 'type' are:
68536871
reset/shutdown of the VM.
68546872
- KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
68556873
The guest physical address of the guest's GHCB is stored in `data[0]`.
6874+
- KVM_SYSTEM_EVENT_TDX_FATAL -- a TDX guest reported a fatal error state.
6875+
KVM doesn't do any parsing or conversion, it just dumps 16 general-purpose
6876+
registers to userspace, in ascending order of the 4-bit indices for x86-64
6877+
general-purpose registers in instruction encoding, as defined in the Intel
6878+
SDM.
68566879
- KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
68576880
KVM has recognized a wakeup event. Userspace may honor this event by
68586881
marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
@@ -8194,6 +8217,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
81948217
and 0x489), as KVM does now allow them to
81958218
be set by userspace (KVM sets them based on
81968219
guest CPUID, for safety purposes).
8220+
8221+
KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
8222+
guest PAT and forces the effective memory
8223+
type to WB in EPT. The quirk is not available
8224+
on Intel platforms which are incapable of
8225+
safely honoring guest PAT (i.e., without CPU
8226+
self-snoop, KVM always ignores guest PAT and
8227+
forces effective memory type to WB). It is
8228+
also ignored on AMD platforms or, on Intel,
8229+
when a VM has non-coherent DMA devices
8230+
assigned; KVM always honors guest PAT in
8231+
such case. The quirk is needed to avoid
8232+
slowdowns on certain Intel Xeon platforms
8233+
(e.g. ICX, SPR) where self-snoop feature is
8234+
supported but UC is slow enough to cause
8235+
issues with some older guests that use
8236+
UC instead of WC to map the video RAM.
8237+
Userspace can disable the quirk to honor
8238+
guest PAT if it knows that there is no such
8239+
guest software, for example if it does not
8240+
expose a bochs graphics device (which is
8241+
known to have had a buggy driver).
81978242
=================================== ============================================
81988243

81998244
7.32 KVM_CAP_MAX_VCPU_ID
@@ -8496,6 +8541,17 @@ aforementioned registers before the first KVM_RUN. These registers are VM
84968541
scoped, meaning that the same set of values are presented on all vCPUs in a
84978542
given VM.
84988543

8544+
7.43 KVM_CAP_RISCV_MP_STATE_RESET
8545+
---------------------------------
8546+
8547+
:Architectures: riscv
8548+
:Type: VM
8549+
:Parameters: None
8550+
:Returns: 0 on success, -EINVAL if arg[0] is not zero
8551+
8552+
When this capability is enabled, KVM resets the VCPU when setting
8553+
MP_STATE_INIT_RECEIVED through IOCTL. The original MP_STATE is preserved.
8554+
84998555
8. Other capabilities.
85008556
======================
85018557

Documentation/virt/kvm/devices/vcpu.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,30 @@ exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
137137
hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
138138
the cpu field to the processor id.
139139

140+
1.5 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS
141+
--------------------------------------------------
142+
143+
:Parameters: in kvm_device_attr.addr the address to an unsigned int
144+
representing the maximum value taken by PMCR_EL0.N
145+
146+
:Returns:
147+
148+
======= ====================================================
149+
-EBUSY PMUv3 already initialized, a VCPU has already run or
150+
an event filter has already been set
151+
-EFAULT Error accessing the value pointed to by addr
152+
-ENODEV PMUv3 not supported or GIC not initialized
153+
-EINVAL No PMUv3 explicitly selected, or value of N out of
154+
range
155+
======= ====================================================
156+
157+
Set the number of implemented event counters in the virtual PMU. This
158+
mandates that a PMU has explicitly been selected via
159+
KVM_ARM_VCPU_PMU_V3_SET_PMU, and will fail when no PMU has been
160+
explicitly selected, or the number of counters is out of range for the
161+
selected PMU. Selecting a new PMU cancels the effect of setting this
162+
attribute.
163+
140164
2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
141165
=================================
142166

Documentation/virt/kvm/x86/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ KVM for x86 systems
1111
cpuid
1212
errata
1313
hypercalls
14+
intel-tdx
1415
mmu
1516
msr
1617
nested-vmx

0 commit comments

Comments
 (0)