Skip to content

Commit 37c3ddf

Browse files
Maxim Levitskysean-jc
authored andcommitted
KVM: VMX: read the PML log in the same order as it was written
Intel's PRM specifies that the CPU writes to the PML log 'backwards' or in other words, it first writes entry 511, then entry 510 and so on. I also confirmed on the bare metal that the CPU indeed writes the entries in this order. KVM on the other hand, reads the entries in the opposite order, from the last written entry and towards entry 511 and dumps them in this order to the dirty ring. Usually this doesn't matter, except for one complex nesting case: KVM reties the instructions that cause MMU faults. This might cause an emulated PML log entry to be visible to L1's hypervisor before the actual memory write was committed. This happens when the L0 MMU fault is followed directly by the VM exit to L1, for example due to a pending L1 interrupt or due to the L1's 'PML log full' event. This problem doesn't have a noticeable real-world impact because this write retry is not much different from the guest writing to the same page multiple times, which is also not reflected in the dirty log. The users of the dirty logging only rely on correct reporting of the clean pages, or in other words they assume that if a page is clean, then no writes were committed to it since the moment it was marked clean. However KVM has a kvm_dirty_log_test selftest, a test that tests both the clean and the dirty pages vs the memory contents, and can fail if it detects a dirty page which has an old value at the offset 0 which the test writes. To avoid failure, the test has a workaround for this specific problem: The test skips checking memory that belongs to the last dirty ring entry, which it has seen, relying on the fact that as long as memory writes are committed in-order, only the last entry can belong to a not yet committed memory write. However, since L1's KVM is reading the PML log in the opposite direction that L0 wrote it, the last dirty ring entry often will be not the last entry written by the L0. To fix this, switch the order in which KVM reads the PML log. Note that this issue is not present on the bare metal, because on the bare metal, an update of the A/D bits of a present entry, PML logging and the actual memory write are all done by the CPU without any hypervisor intervention and pending interrupt evaluation, thus once a PML log and/or vCPU kick happens, all memory writes that are in the PML log are committed to memory. The only exception to this rule is when the guest hits a not present EPT entry, in which case KVM first reads (backward) the PML log, dumps it to the dirty ring, and *then* sets up a SPTE entry with A/D bits set, and logs this to the dirty ring, thus making the entry be the last one in the dirty ring. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241219221034.903927-3-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent ae81ce9 commit 37c3ddf

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

arch/x86/kvm/vmx/vmx.c

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6203,26 +6203,34 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
62036203
static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
62046204
{
62056205
struct vcpu_vmx *vmx = to_vmx(vcpu);
6206+
u16 pml_idx, pml_tail_index;
62066207
u64 *pml_buf;
6207-
u16 pml_idx;
6208+
int i;
62086209

62096210
pml_idx = vmcs_read16(GUEST_PML_INDEX);
62106211

62116212
/* Do nothing if PML buffer is empty */
62126213
if (pml_idx == PML_HEAD_INDEX)
62136214
return;
6215+
/*
6216+
* PML index always points to the next available PML buffer entity
6217+
* unless PML log has just overflowed.
6218+
*/
6219+
pml_tail_index = (pml_idx >= PML_LOG_NR_ENTRIES) ? 0 : pml_idx + 1;
62146220

6215-
/* PML index always points to next available PML buffer entity */
6216-
if (pml_idx >= PML_LOG_NR_ENTRIES)
6217-
pml_idx = 0;
6218-
else
6219-
pml_idx++;
6220-
6221+
/*
6222+
* PML log is written backwards: the CPU first writes the entry 511
6223+
* then the entry 510, and so on.
6224+
*
6225+
* Read the entries in the same order they were written, to ensure that
6226+
* the dirty ring is filled in the same order the CPU wrote them.
6227+
*/
62216228
pml_buf = page_address(vmx->pml_pg);
6222-
for (; pml_idx < PML_LOG_NR_ENTRIES; pml_idx++) {
6229+
6230+
for (i = PML_HEAD_INDEX; i >= pml_tail_index; i--) {
62236231
u64 gpa;
62246232

6225-
gpa = pml_buf[pml_idx];
6233+
gpa = pml_buf[i];
62266234
WARN_ON(gpa & (PAGE_SIZE - 1));
62276235
kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
62286236
}

0 commit comments

Comments
 (0)