Skip to content

nvme: Print controller GUID in shutdown_nvme_driver span #1619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eric135
Copy link
Contributor

@eric135 eric135 commented Jun 27, 2025

This allows us to directly connect the NVMe device to the controller attached to the VM in one log line.

@eric135 eric135 requested a review from a team as a code owner June 27, 2025 23:23
@eric135 eric135 requested a review from a team as a code owner June 28, 2025 21:57
@@ -1051,6 +1065,9 @@ pub mod save_restore {
/// NVMe driver worker task data.
#[mesh(4)]
pub worker_data: NvmeDriverWorkerSavedState,
/// Controller instance ID.
#[mesh(5)]
pub controller_instance_id: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need to save this?

@jstarks
Copy link
Member

jstarks commented Jun 29, 2025

I am not a big fan of this approach, because we're putting VPCI-specific stuff into our NVMe driver. It even infected the saved state (although I am hoping that's just an oversight).

Is there a way we can instead provide a parent Span or something that gets used for this operation, allowing us to put all the context we need in the parent span?

@mattkur
Copy link
Contributor

mattkur commented Jun 30, 2025

I am not a big fan of this approach, because we're putting VPCI-specific stuff into our NVMe driver. It even infected the saved state (although I am hoping that's just an oversight).

Is there a way we can instead provide a parent Span or something that gets used for this operation, allowing us to put all the context we need in the parent span?

Ack your feedback about putting vpci-specific stuff into our NVMe driver. I don't fully agree with the push back: we have debug friendly identifiers in other places throughout the stack; I think it's reasonable to add a debug friendly identifier here. For now that can be a stringified guid that happens to be the vpci vmbus instance ID.

While this change happens to only add the vsid to this particular span, I don't think that's far enough. I think we need to include in all cases where we currently log the pci_id in the nvme driver.

How we end up getting these logs at scale make it difficult to operationalize a span parent/child walk. I admit that comment is disappointing, but adding a friendly name in this device will make it easier to debug and diagnose at scale.

@eric135 eric135 requested review from axelandrejs and a team July 1, 2025 21:54
@@ -336,6 +349,11 @@ impl NvmeManagerWorker {
.await
.map_err(InnerError::DeviceInitFailed)?;

// Pass through controller instance ID if specified.
if let Some(controller_instance_id) = controller_instance_id {
driver.set_controller_instance_id(controller_instance_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a setter, we just called new?

@smalis-msft
Copy link
Contributor

If this is going to be used in debugging/tracing scenarios only then it'd be nice to have a comment on the field saying so, or maybe include the words 'debug' or 'trace' in the name, or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants