Skip to content

firmware_uefi: EfiDiagnostics backports for watchdog handling, tracelimit and other improvements #1691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

maheeraeron
Copy link
Contributor

@maheeraeron maheeraeron commented Jul 14, 2025

@maheeraeron maheeraeron requested a review from a team as a code owner July 14, 2025 17:39
@maheeraeron maheeraeron added the release_2505 Targets the release/2505 branch. label Jul 14, 2025
@maheeraeron maheeraeron requested a review from a team as a code owner July 14, 2025 17:39
@maheeraeron
Copy link
Contributor Author

maheeraeron commented Jul 14, 2025

Next PR will be the latter half with EfiDiagnostics for flushing on watchdog timeout. Original PR of that: #1677

smalis-msft
smalis-msft previously approved these changes Jul 15, 2025
@maheeraeron
Copy link
Contributor Author

Holding off on this until two other PRs check in to address a polling strategy instead of Arc Mutex and customized tracelimit usage.

I'll probably open a new back port PR with all those changes combined to make review easier, but expect me to close this one soon

@chris-oo chris-oo requested a review from smalis-msft July 16, 2025 20:04
Copy link
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for other fixups in main

…icrosoft#1677)

Today, we do not issue an NMI when ARM64 VMs encounter a UEFI watchdog
timeout. This prevents UEFI from going through the `ReportCrash()` path,
failing to issue the signal to process UEFI diagnostics.

This PR solves this by forcing EfiDiagnostics to flush when learning
about a watchdog timeout.

Additionally, if you use `uhdiag` to inspect the `UefiDevice`, this will
also trigger EfiDiagnostics to flush.

For the watchdog path, here is an example from an OpenHCL build of this
with a private UEFI that forces RngDxe to stall for > 2 minutes:
```
[121.383009] watchdog_core: ERROR  Encountered a watchdog timeout name="uefi-watchdog"
[121.566205] underhill_core::livedump: INFO  livedump succeeded
[121.566769] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd3713 phase=PEI_CORE log_message="PcdPeim.efi"
[121.567081] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd37a2 phase=PEI_CORE log_message="ResetSystemPei.efi"
[121.567241] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd37e5 phase=PEI_CORE log_message="RngPei.efi"
[121.567504] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd3824 phase=PEI_CORE log_message="PlatformPei.efi"
[121.567653] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd49de phase=PEI_CORE log_message="PeiCore.efi"
[121.567796] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd4a8b phase=PEI_CORE log_message="PcdPeim.efi"
[121.567938] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd4b10 phase=PEI_CORE log_message="DxeIpl.efi"
[121.568083] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd4ba2 phase=PEI_CORE log_message="MsUiThemePpi.efi"
[121.568229] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd4c69 phase=PEI_CORE log_message="DebugConfigPei.efi"
[121.568371] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd4cec phase=PEI_CORE log_message="CryptoPei.efi"
[121.568627] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd4d90 phase=PEI_CORE log_message="Tcg2Pei.efi"
[121.568775] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdd5025 phase=PEI_CORE log_message="DxeCore.efi"
[121.568920] firmware_uefi::service::diagnostics: ERROR  EFI log entry debug_level=ERROR ticks=0xdd5f9e phase=PEI_CORE log_message="PeiDelayedDispatchOnEndOfPei Count of dispatch cycles is 0"
[121.569063] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde5107 phase=DXE log_message="PcdDxe.efi"
[121.569257] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde55e4 phase=DXE log_message="CpuDxe.efi"
[121.569399] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde6201 phase=DXE log_message="EfiHvDxe.efi"
[121.569587] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde67d6 phase=DXE log_message="Metronome.efi"
[121.569734] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde6ac4 phase=DXE log_message="RuntimeDxe.efi"
[121.569875] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde6eb6 phase=DXE log_message="ResetSystemRuntimeDxe.efi"
[121.570022] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde728b phase=DXE log_message="ReportStatusCodeRouterRuntimeDxe.efi"
[121.570165] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde76d0 phase=DXE log_message="EventLogDxe.efi"
[121.570311] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde7b22 phase=DXE log_message="SynicTimerDxe.efi"
[121.570507] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde8045 phase=DXE log_message="VariableRuntimeDxe.efi"
[121.570660] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xde84e7 phase=DXE log_message="PlatformDeviceStateHelper.efi"
[121.570805] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdea739 phase=DXE log_message="AcpiTableDxe.efi"
[121.570950] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdeaa19 phase=DXE log_message="CapsuleRuntimeDxe.efi"
[121.571090] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdead58 phase=DXE log_message="DevicePathDxe.efi"
[121.571234] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdeb174 phase=DXE log_message="HiiDatabase.efi"
[121.571379] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdeb4d1 phase=DXE log_message="NullMemoryTestDxe.efi"
[121.571654] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xdeb7ad phase=DXE log_message="MonotonicCounterRuntimeDxe.efi"
[121.571797] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xe0948d phase=DXE log_message="SmbiosDxe.efi"
[121.571941] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xe098b6 phase=DXE log_message="EmclDxe.efi"
[121.572087] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xe09c72 phase=DXE log_message="MsvmPcRtc.efi"
[121.572288] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xe4170a phase=DXE log_message="VmbusDxe.efi"
[121.572473] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe458b5 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (525074DC-8985-46E2-8057-A307DC18A502)."
[121.572656] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45919 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (F8E65716-3CB3-4A06-9A60-1889C5CCCAB5)."
[121.572803] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45a04 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (3375BAF4-9E15-4B30-B765-67ACB10D607B)."
[121.572949] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45a44 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (57164F39-9115-4E78-AB55-382F3BD5422D)."
[121.573093] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45a82 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (A9A0F4E7-5A45-4D96-B827-8A841E8C03E6)."
[121.573238] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45ac0 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (9527E630-D0AE-497B-ADCE-E80AB0175CAF)."
[121.573384] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45afe phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (35FA2E29-EA23-4236-96AE-3A6EBACBA440)."
[121.573670] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45b3c phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (276AACF4-AC15-426C-98DD-7521AD3F01FE)."
[121.573815] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45c37 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (CFA8B69E-5B4A-4CC0-B98B-8BA1A1F3F95A)."
[121.573961] firmware_uefi::service::diagnostics: WARN  EFI log entry debug_level=WARNING ticks=0xe45c77 phase=DXE log_message="VmbusRootIsChannelAllowed: Channel not allowed during boot (0E0B6031-5213-4934-818B-38D90CED39DB)."
[121.574104] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xe46037 phase=DXE log_message="WatchdogTimer.efi"
[121.574246] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xe46b10 phase=DXE log_message="DpcDxe.efi"
[121.574389] firmware_uefi::service::diagnostics: INFO  EFI log entry debug_level=LOAD+ERROR ticks=0xe46fce phase=DXE log_message="RngDxe.efi"
[121.574655] firmware_uefi::service::diagnostics: INFO  processed EFI log entries entries_processed=0x2f bytes_read=0x1000
[121.574790] firmware_uefi::service::diagnostics: INFO  Processed EFI Diagnostics successfully on watchdog timeout
```

For the inspect path, the output will be similar but dependent on _when_
the user inspects the UefiDevice.
…>> (microsoft#1693)

This PR aims to take a different approach to flushing EfiDiagnostics on
Watchdog timeout.

- Removes Arc<Mutex<>>, which was originally used to share the
diagnostics service between UefiDevice and its WatchdogPlatform
- Creates a mesh::channel before creating the UefiDevice. The
watchdog_callback that gets added to the UefiDevice has its `on_timeout`
method modified to invoke the sender to send a notification.
- The mesh::receiver end gets sent down the UefiDevice, and it is used
in `poll_device()` to freely respond to watchdog events
@maheeraeron maheeraeron changed the title watchdog_core: Backport allow multiple callbacks during watchdog timeout (#1668) firmware_uefi: EfiDiagnostics backports for watchdog handling, tracelimit and other improvements Jul 17, 2025
@maheeraeron
Copy link
Contributor Author

@chris-oo @smalis-msft This should be ready for another review now, all the commits are in this now

@maheeraeron maheeraeron requested a review from chris-oo July 17, 2025 20:33
@maheeraeron maheeraeron merged commit 4cd692a into microsoft:release/2505 Jul 17, 2025
24 checks passed
@maheeraeron maheeraeron deleted the user/maheeraeron/backport-watchdog-core branch July 17, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_2505 Targets the release/2505 branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants