Skip to content

Commit 3043863

Browse files
mhklinuxliuw
authored andcommitted
fbdev: hyperv_fb: Fix hang in kdump kernel when on Hyper-V Gen 2 VMs
Gen 2 Hyper-V VMs boot via EFI and have a standard EFI framebuffer device. When the kdump kernel runs in such a VM, loading the efifb driver may hang because of accessing the framebuffer at the wrong memory address. The scenario occurs when the hyperv_fb driver in the original kernel moves the framebuffer to a different MMIO address because of conflicts with an already-running efifb or simplefb driver. The hyperv_fb driver then informs Hyper-V of the change, which is allowed by the Hyper-V FB VMBus device protocol. However, when the kexec command loads the kdump kernel into crash memory via the kexec_file_load() system call, the system call doesn't know the framebuffer has moved, and it sets up the kdump screen_info using the original framebuffer address. The transition to the kdump kernel does not go through the Hyper-V host, so Hyper-V does not reset the framebuffer address like it would do on a reboot. When efifb tries to run, it accesses a non-existent framebuffer address, which traps to the Hyper-V host. After many such accesses, the Hyper-V host thinks the guest is being malicious, and throttles the guest to the point that it runs very slowly or appears to have hung. When the kdump kernel is loaded into crash memory via the kexec_load() system call, the problem does not occur. In this case, the kexec command builds the screen_info table itself in user space from data returned by the FBIOGET_FSCREENINFO ioctl against /dev/fb0, which gives it the new framebuffer location. This problem was originally reported in 2020 [1], resulting in commit 3cb73bc ("hyperv_fb: Update screen_info after removing old framebuffer"). This commit solved the problem by setting orig_video_isVGA to 0, so the kdump kernel was unaware of the EFI framebuffer. The efifb driver did not try to load, and no hang occurred. But in 2024, commit c25a19a ("fbdev/hyperv_fb: Do not clear global screen_info") effectively reverted 3cb73bc. Commit c25a19a has no reference to 3cb73bc, so perhaps it was done without knowing the implications that were reported with 3cb73bc. In any case, as of commit c25a19a, the original problem came back again. Interestingly, the hyperv_drm driver does not have this problem because it never moves the framebuffer. The difference is that the hyperv_drm driver removes any conflicting framebuffers *before* allocating an MMIO address, while the hyperv_fb drivers removes conflicting framebuffers *after* allocating an MMIO address. With the "after" ordering, hyperv_fb may encounter a conflict and move the framebuffer to a different MMIO address. But the conflict is essentially bogus because it is removed a few lines of code later. Rather than fix the problem with the approach from 2020 in commit 3cb73bc, instead slightly reorder the steps in hyperv_fb so conflicting framebuffers are removed before allocating an MMIO address. Then the default framebuffer MMIO address should always be available, and there's never any confusion about which framebuffer address the kdump kernel should use -- it's always the original address provided by the Hyper-V host. This approach is already used by the hyperv_drm driver, and is consistent with the usage guidelines at the head of the module with the function aperture_remove_conflicting_devices(). This approach also solves a related minor problem when kexec_load() is used to load the kdump kernel. With current code, unbinding and rebinding the hyperv_fb driver could result in the framebuffer moving back to the default framebuffer address, because on the rebind there are no conflicts. If such a move is done after the kdump kernel is loaded with the new framebuffer address, at kdump time it could again have the wrong address. This problem and fix are described in terms of the kdump kernel, but it can also occur with any kernel started via kexec. See extensive discussion of the problem and solution at [2]. [1] https://lore.kernel.org/linux-hyperv/20201014092429.1415040-1-kasong@redhat.com/ [2] https://lore.kernel.org/linux-hyperv/BLAPR10MB521793485093FDB448F7B2E5FDE92@BLAPR10MB5217.namprd10.prod.outlook.com/ Reported-by: Thomas Tai <thomas.tai@oracle.com> Fixes: c25a19a ("fbdev/hyperv_fb: Do not clear global screen_info") Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250218230130.3207-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250218230130.3207-1-mhklinux@outlook.com>
1 parent aed7093 commit 3043863

File tree

1 file changed

+13
-7
lines changed

1 file changed

+13
-7
lines changed

drivers/video/fbdev/hyperv_fb.c

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -989,6 +989,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
989989

990990
base = pci_resource_start(pdev, 0);
991991
size = pci_resource_len(pdev, 0);
992+
aperture_remove_conflicting_devices(base, size, KBUILD_MODNAME);
992993

993994
/*
994995
* For Gen 1 VM, we can directly use the contiguous memory
@@ -1010,11 +1011,21 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
10101011
goto getmem_done;
10111012
}
10121013
pr_info("Unable to allocate enough contiguous physical memory on Gen 1 VM. Using MMIO instead.\n");
1014+
} else {
1015+
aperture_remove_all_conflicting_devices(KBUILD_MODNAME);
10131016
}
10141017

10151018
/*
1016-
* Cannot use the contiguous physical memory.
1017-
* Allocate mmio space for framebuffer.
1019+
* Cannot use contiguous physical memory, so allocate MMIO space for
1020+
* the framebuffer. At this point in the function, conflicting devices
1021+
* that might have claimed the framebuffer MMIO space based on
1022+
* screen_info.lfb_base must have already been removed so that
1023+
* vmbus_allocate_mmio() does not allocate different MMIO space. If the
1024+
* kdump image were to be loaded using kexec_file_load(), the
1025+
* framebuffer location in the kdump image would be set from
1026+
* screen_info.lfb_base at the time that kdump is enabled. If the
1027+
* framebuffer has moved elsewhere, this could be the wrong location,
1028+
* causing kdump to hang when efifb (for example) loads.
10181029
*/
10191030
dio_fb_size =
10201031
screen_width * screen_height * screen_depth / 8;
@@ -1051,11 +1062,6 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
10511062
info->screen_size = dio_fb_size;
10521063

10531064
getmem_done:
1054-
if (base && size)
1055-
aperture_remove_conflicting_devices(base, size, KBUILD_MODNAME);
1056-
else
1057-
aperture_remove_all_conflicting_devices(KBUILD_MODNAME);
1058-
10591065
if (!gen2vm)
10601066
pci_dev_put(pdev);
10611067

0 commit comments

Comments
 (0)