Skip to content

Commit 6803bd7

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...
2 parents 5be9911 + 45b890f commit 6803bd7

File tree

127 files changed

+8885
-1503
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

127 files changed

+8885
-1503
lines changed

Documentation/devicetree/bindings/riscv/extensions.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,12 @@ properties:
128128
changes to interrupts as frozen at commit ccbddab ("Merge pull
129129
request #42 from riscv/jhauser-2023-RC4") of riscv-aia.
130130
131+
- const: smstateen
132+
description: |
133+
The standard Smstateen extension for controlling access to CSRs
134+
added by other RISC-V extensions in H/S/VS/U/VU modes and as
135+
ratified at commit a28bfae (Ratified (#7)) of riscv-state-enable.
136+
131137
- const: ssaia
132138
description: |
133139
The standard Ssaia supervisor-level extension for the advanced
@@ -212,6 +218,12 @@ properties:
212218
ratified in the 20191213 version of the unprivileged ISA
213219
specification.
214220

221+
- const: zicond
222+
description:
223+
The standard Zicond extension for conditional arithmetic and
224+
conditional-select/move operations as ratified in commit 95cf1f9
225+
("Add changes requested by Ved during signoff") of riscv-zicond.
226+
215227
- const: zicsr
216228
description: |
217229
The standard Zicsr extension for control and status register

Documentation/virt/kvm/api.rst

Lines changed: 140 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,13 @@ Reads the general purpose registers from the vcpu.
416416
__u64 pc;
417417
};
418418

419+
/* LoongArch */
420+
struct kvm_regs {
421+
/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
422+
unsigned long gpr[32];
423+
unsigned long pc;
424+
};
425+
419426

420427
4.12 KVM_SET_REGS
421428
-----------------
@@ -506,7 +513,7 @@ translation mode.
506513
------------------
507514

508515
:Capability: basic
509-
:Architectures: x86, ppc, mips, riscv
516+
:Architectures: x86, ppc, mips, riscv, loongarch
510517
:Type: vcpu ioctl
511518
:Parameters: struct kvm_interrupt (in)
512519
:Returns: 0 on success, negative on failure.
@@ -540,7 +547,7 @@ ioctl is useful if the in-kernel PIC is not used.
540547
PPC:
541548
^^^^
542549

543-
Queues an external interrupt to be injected. This ioctl is overleaded
550+
Queues an external interrupt to be injected. This ioctl is overloaded
544551
with 3 different irq values:
545552

546553
a) KVM_INTERRUPT_SET
@@ -592,6 +599,14 @@ b) KVM_INTERRUPT_UNSET
592599

593600
This is an asynchronous vcpu ioctl and can be invoked from any thread.
594601

602+
LOONGARCH:
603+
^^^^^^^^^^
604+
605+
Queues an external interrupt to be injected into the virtual CPU. A negative
606+
interrupt number dequeues the interrupt.
607+
608+
This is an asynchronous vcpu ioctl and can be invoked from any thread.
609+
595610

596611
4.17 KVM_DEBUG_GUEST
597612
--------------------
@@ -737,7 +752,7 @@ signal mask.
737752
----------------
738753

739754
:Capability: basic
740-
:Architectures: x86
755+
:Architectures: x86, loongarch
741756
:Type: vcpu ioctl
742757
:Parameters: struct kvm_fpu (out)
743758
:Returns: 0 on success, -1 on error
@@ -746,7 +761,7 @@ Reads the floating point state from the vcpu.
746761

747762
::
748763

749-
/* for KVM_GET_FPU and KVM_SET_FPU */
764+
/* x86: for KVM_GET_FPU and KVM_SET_FPU */
750765
struct kvm_fpu {
751766
__u8 fpr[8][16];
752767
__u16 fcw;
@@ -761,12 +776,21 @@ Reads the floating point state from the vcpu.
761776
__u32 pad2;
762777
};
763778

779+
/* LoongArch: for KVM_GET_FPU and KVM_SET_FPU */
780+
struct kvm_fpu {
781+
__u32 fcsr;
782+
__u64 fcc;
783+
struct kvm_fpureg {
784+
__u64 val64[4];
785+
}fpr[32];
786+
};
787+
764788

765789
4.23 KVM_SET_FPU
766790
----------------
767791

768792
:Capability: basic
769-
:Architectures: x86
793+
:Architectures: x86, loongarch
770794
:Type: vcpu ioctl
771795
:Parameters: struct kvm_fpu (in)
772796
:Returns: 0 on success, -1 on error
@@ -775,7 +799,7 @@ Writes the floating point state to the vcpu.
775799

776800
::
777801

778-
/* for KVM_GET_FPU and KVM_SET_FPU */
802+
/* x86: for KVM_GET_FPU and KVM_SET_FPU */
779803
struct kvm_fpu {
780804
__u8 fpr[8][16];
781805
__u16 fcw;
@@ -790,6 +814,15 @@ Writes the floating point state to the vcpu.
790814
__u32 pad2;
791815
};
792816

817+
/* LoongArch: for KVM_GET_FPU and KVM_SET_FPU */
818+
struct kvm_fpu {
819+
__u32 fcsr;
820+
__u64 fcc;
821+
struct kvm_fpureg {
822+
__u64 val64[4];
823+
}fpr[32];
824+
};
825+
793826

794827
4.24 KVM_CREATE_IRQCHIP
795828
-----------------------
@@ -965,7 +998,7 @@ be set in the flags field of this ioctl:
965998
The KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag requests KVM to generate
966999
the contents of the hypercall page automatically; hypercalls will be
9671000
intercepted and passed to userspace through KVM_EXIT_XEN. In this
968-
ase, all of the blob size and address fields must be zero.
1001+
case, all of the blob size and address fields must be zero.
9691002

9701003
The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates to KVM that userspace
9711004
will always use the KVM_XEN_HVM_EVTCHN_SEND ioctl to deliver event
@@ -1070,7 +1103,7 @@ Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored.
10701103
:Extended by: KVM_CAP_INTR_SHADOW
10711104
:Architectures: x86, arm64
10721105
:Type: vcpu ioctl
1073-
:Parameters: struct kvm_vcpu_event (out)
1106+
:Parameters: struct kvm_vcpu_events (out)
10741107
:Returns: 0 on success, -1 on error
10751108

10761109
X86:
@@ -1193,7 +1226,7 @@ directly to the virtual CPU).
11931226
:Extended by: KVM_CAP_INTR_SHADOW
11941227
:Architectures: x86, arm64
11951228
:Type: vcpu ioctl
1196-
:Parameters: struct kvm_vcpu_event (in)
1229+
:Parameters: struct kvm_vcpu_events (in)
11971230
:Returns: 0 on success, -1 on error
11981231

11991232
X86:
@@ -1387,7 +1420,7 @@ documentation when it pops into existence).
13871420
-------------------
13881421

13891422
:Capability: KVM_CAP_ENABLE_CAP
1390-
:Architectures: mips, ppc, s390, x86
1423+
:Architectures: mips, ppc, s390, x86, loongarch
13911424
:Type: vcpu ioctl
13921425
:Parameters: struct kvm_enable_cap (in)
13931426
:Returns: 0 on success; -1 on error
@@ -1442,7 +1475,7 @@ for vm-wide capabilities.
14421475
---------------------
14431476

14441477
:Capability: KVM_CAP_MP_STATE
1445-
:Architectures: x86, s390, arm64, riscv
1478+
:Architectures: x86, s390, arm64, riscv, loongarch
14461479
:Type: vcpu ioctl
14471480
:Parameters: struct kvm_mp_state (out)
14481481
:Returns: 0 on success; -1 on error
@@ -1460,7 +1493,7 @@ Possible values are:
14601493

14611494
========================== ===============================================
14621495
KVM_MP_STATE_RUNNABLE the vcpu is currently running
1463-
[x86,arm64,riscv]
1496+
[x86,arm64,riscv,loongarch]
14641497
KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP)
14651498
which has not yet received an INIT signal [x86]
14661499
KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
@@ -1516,11 +1549,14 @@ For riscv:
15161549
The only states that are valid are KVM_MP_STATE_STOPPED and
15171550
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
15181551

1552+
On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
1553+
whether the vcpu is runnable.
1554+
15191555
4.39 KVM_SET_MP_STATE
15201556
---------------------
15211557

15221558
:Capability: KVM_CAP_MP_STATE
1523-
:Architectures: x86, s390, arm64, riscv
1559+
:Architectures: x86, s390, arm64, riscv, loongarch
15241560
:Type: vcpu ioctl
15251561
:Parameters: struct kvm_mp_state (in)
15261562
:Returns: 0 on success; -1 on error
@@ -1538,6 +1574,9 @@ For arm64/riscv:
15381574
The only states that are valid are KVM_MP_STATE_STOPPED and
15391575
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
15401576

1577+
On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
1578+
whether the vcpu is runnable.
1579+
15411580
4.40 KVM_SET_IDENTITY_MAP_ADDR
15421581
------------------------------
15431582

@@ -2841,6 +2880,19 @@ Following are the RISC-V D-extension registers:
28412880
0x8020 0000 0600 0020 fcsr Floating point control and status register
28422881
======================= ========= =============================================
28432882

2883+
LoongArch registers are mapped using the lower 32 bits. The upper 16 bits of
2884+
that is the register group type.
2885+
2886+
LoongArch csr registers are used to control guest cpu or get status of guest
2887+
cpu, and they have the following id bit patterns::
2888+
2889+
0x9030 0000 0001 00 <reg:5> <sel:3> (64-bit)
2890+
2891+
LoongArch KVM control registers are used to implement some new defined functions
2892+
such as set vcpu counter or reset vcpu, and they have the following id bit patterns::
2893+
2894+
0x9030 0000 0002 <reg:16>
2895+
28442896

28452897
4.69 KVM_GET_ONE_REG
28462898
--------------------
@@ -3063,7 +3115,7 @@ as follow::
30633115
};
30643116

30653117
An entry with a "page_shift" of 0 is unused. Because the array is
3066-
organized in increasing order, a lookup can stop when encoutering
3118+
organized in increasing order, a lookup can stop when encountering
30673119
such an entry.
30683120

30693121
The "slb_enc" field provides the encoding to use in the SLB for the
@@ -3370,6 +3422,8 @@ return indicates the attribute is implemented. It does not necessarily
33703422
indicate that the attribute can be read or written in the device's
33713423
current state. "addr" is ignored.
33723424

3425+
.. _KVM_ARM_VCPU_INIT:
3426+
33733427
4.82 KVM_ARM_VCPU_INIT
33743428
----------------------
33753429

@@ -3455,7 +3509,7 @@ Possible features:
34553509
- KVM_RUN and KVM_GET_REG_LIST are not available;
34563510

34573511
- KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access
3458-
the scalable archietctural SVE registers
3512+
the scalable architectural SVE registers
34593513
KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or
34603514
KVM_REG_ARM64_SVE_FFR;
34613515

@@ -4401,7 +4455,7 @@ This will have undefined effects on the guest if it has not already
44014455
placed itself in a quiescent state where no vcpu will make MMU enabled
44024456
memory accesses.
44034457

4404-
On succsful completion, the pending HPT will become the guest's active
4458+
On successful completion, the pending HPT will become the guest's active
44054459
HPT and the previous HPT will be discarded.
44064460

44074461
On failure, the guest will still be operating on its previous HPT.
@@ -5016,7 +5070,7 @@ before the vcpu is fully usable.
50165070

50175071
Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be
50185072
configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration
5019-
that should be performaned and how to do it are feature-dependent.
5073+
that should be performed and how to do it are feature-dependent.
50205074

50215075
Other calls that depend on a particular feature being finalized, such as
50225076
KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with
@@ -5124,6 +5178,24 @@ Valid values for 'action'::
51245178
#define KVM_PMU_EVENT_ALLOW 0
51255179
#define KVM_PMU_EVENT_DENY 1
51265180

5181+
Via this API, KVM userspace can also control the behavior of the VM's fixed
5182+
counters (if any) by configuring the "action" and "fixed_counter_bitmap" fields.
5183+
5184+
Specifically, KVM follows the following pseudo-code when determining whether to
5185+
allow the guest FixCtr[i] to count its pre-defined fixed event::
5186+
5187+
FixCtr[i]_is_allowed = (action == ALLOW) && (bitmap & BIT(i)) ||
5188+
(action == DENY) && !(bitmap & BIT(i));
5189+
FixCtr[i]_is_denied = !FixCtr[i]_is_allowed;
5190+
5191+
KVM always consumes fixed_counter_bitmap, it's userspace's responsibility to
5192+
ensure fixed_counter_bitmap is set correctly, e.g. if userspace wants to define
5193+
a filter that only affects general purpose counters.
5194+
5195+
Note, the "events" field also applies to fixed counters' hardcoded event_select
5196+
and unit_mask values. "fixed_counter_bitmap" has higher priority than "events"
5197+
if there is a contradiction between the two.
5198+
51275199
4.121 KVM_PPC_SVM_OFF
51285200
---------------------
51295201

@@ -5475,7 +5547,7 @@ KVM_XEN_ATTR_TYPE_EVTCHN
54755547
from the guest. A given sending port number may be directed back to
54765548
a specified vCPU (by APIC ID) / port / priority on the guest, or to
54775549
trigger events on an eventfd. The vCPU and priority can be changed
5478-
by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but but other
5550+
by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but other
54795551
fields cannot change for a given sending port. A port mapping is
54805552
removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags field. Passing
54815553
KVM_XEN_EVTCHN_RESET in the flags field removes all interception of
@@ -6070,6 +6142,56 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
60706142
interface. No error will be returned, but the resulting offset will not be
60716143
applied.
60726144

6145+
.. _KVM_ARM_GET_REG_WRITABLE_MASKS:
6146+
6147+
4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
6148+
-------------------------------------------
6149+
6150+
:Capability: KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES
6151+
:Architectures: arm64
6152+
:Type: vm ioctl
6153+
:Parameters: struct reg_mask_range (in/out)
6154+
:Returns: 0 on success, < 0 on error
6155+
6156+
6157+
::
6158+
6159+
#define KVM_ARM_FEATURE_ID_RANGE 0
6160+
#define KVM_ARM_FEATURE_ID_RANGE_SIZE (3 * 8 * 8)
6161+
6162+
struct reg_mask_range {
6163+
__u64 addr; /* Pointer to mask array */
6164+
__u32 range; /* Requested range */
6165+
__u32 reserved[13];
6166+
};
6167+
6168+
This ioctl copies the writable masks for a selected range of registers to
6169+
userspace.
6170+
6171+
The ``addr`` field is a pointer to the destination array where KVM copies
6172+
the writable masks.
6173+
6174+
The ``range`` field indicates the requested range of registers.
6175+
``KVM_CHECK_EXTENSION`` for the ``KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES``
6176+
capability returns the supported ranges, expressed as a set of flags. Each
6177+
flag's bit index represents a possible value for the ``range`` field.
6178+
All other values are reserved for future use and KVM may return an error.
6179+
6180+
The ``reserved[13]`` array is reserved for future use and should be 0, or
6181+
KVM may return an error.
6182+
6183+
KVM_ARM_FEATURE_ID_RANGE (0)
6184+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6185+
6186+
The Feature ID range is defined as the AArch64 System register space with
6187+
op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.
6188+
6189+
The mask returned array pointed to by ``addr`` is indexed by the macro
6190+
``ARM64_FEATURE_ID_RANGE_IDX(op0, op1, crn, crm, op2)``, allowing userspace
6191+
to know what fields can be changed for the system register described by
6192+
``op0, op1, crn, crm, op2``. KVM rejects ID register values that describe a
6193+
superset of the features supported by the system.
6194+
60736195
5. The kvm_run structure
60746196
========================
60756197

Documentation/virt/kvm/arm/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ ARM
1111
hypercalls
1212
pvtime
1313
ptp_kvm
14+
vcpu-features

0 commit comments

Comments
 (0)