Skip to content

Commit 4418723

Browse files
mzhang3579bonzini
authored andcommitted
KVM: x86/mmu: fix potential races when walking host page table
KVM uses lookup_address_in_mm() to detect the hugepage size that the host uses to map a pfn. The function suffers from several issues: - no usage of READ_ONCE(*). This allows multiple dereference of the same page table entry. The TOCTOU problem because of that may cause KVM to incorrectly treat a newly generated leaf entry as a nonleaf one, and dereference the content by using its pfn value. - the information returned does not match what KVM needs; for non-present entries it returns the level at which the walk was terminated, as long as the entry is not 'none'. KVM needs level information of only 'present' entries, otherwise it may regard a non-present PXE entry as a present large page mapping. - the function is not safe for mappings that can be torn down, because it does not disable IRQs and because it returns a PTE pointer which is never safe to dereference after the function returns. So implement the logic for walking host page tables directly in KVM, and stop using lookup_address_in_mm(). Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220429031757.2042406-1-mizhang@google.com> [Inline in host_pfn_mapping_level, ensure no semantic change for its callers. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 parent d495f94 commit 4418723

File tree

1 file changed

+42
-5
lines changed

1 file changed

+42
-5
lines changed

arch/x86/kvm/mmu/mmu.c

Lines changed: 42 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2804,8 +2804,12 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
28042804
const struct kvm_memory_slot *slot)
28052805
{
28062806
unsigned long hva;
2807-
pte_t *pte;
2808-
int level;
2807+
unsigned long flags;
2808+
int level = PG_LEVEL_4K;
2809+
pgd_t pgd;
2810+
p4d_t p4d;
2811+
pud_t pud;
2812+
pmd_t pmd;
28092813

28102814
if (!PageCompound(pfn_to_page(pfn)) && !kvm_is_zone_device_pfn(pfn))
28112815
return PG_LEVEL_4K;
@@ -2820,10 +2824,43 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
28202824
*/
28212825
hva = __gfn_to_hva_memslot(slot, gfn);
28222826

2823-
pte = lookup_address_in_mm(kvm->mm, hva, &level);
2824-
if (unlikely(!pte))
2825-
return PG_LEVEL_4K;
2827+
/*
2828+
* Lookup the mapping level in the current mm. The information
2829+
* may become stale soon, but it is safe to use as long as
2830+
* 1) mmu_notifier_retry was checked after taking mmu_lock, and
2831+
* 2) mmu_lock is taken now.
2832+
*
2833+
* We still need to disable IRQs to prevent concurrent tear down
2834+
* of page tables.
2835+
*/
2836+
local_irq_save(flags);
2837+
2838+
pgd = READ_ONCE(*pgd_offset(kvm->mm, hva));
2839+
if (pgd_none(pgd))
2840+
goto out;
2841+
2842+
p4d = READ_ONCE(*p4d_offset(&pgd, hva));
2843+
if (p4d_none(p4d) || !p4d_present(p4d))
2844+
goto out;
28262845

2846+
pud = READ_ONCE(*pud_offset(&p4d, hva));
2847+
if (pud_none(pud) || !pud_present(pud))
2848+
goto out;
2849+
2850+
if (pud_large(pud)) {
2851+
level = PG_LEVEL_1G;
2852+
goto out;
2853+
}
2854+
2855+
pmd = READ_ONCE(*pmd_offset(&pud, hva));
2856+
if (pmd_none(pmd) || !pmd_present(pmd))
2857+
goto out;
2858+
2859+
if (pmd_large(pmd))
2860+
level = PG_LEVEL_2M;
2861+
2862+
out:
2863+
local_irq_restore(flags);
28272864
return level;
28282865
}
28292866

0 commit comments

Comments
 (0)