Fixing VMM & RNG per-device metrics. (Minor Tap fix) #5196

bstrong04 · 2025-05-07T04:55:00Z

Changes

Added a new device metric system for both RNG and VMM VIRTIO devices. This includes mapping them through a BTreeMap as mentioned in issue #4145 for the net devices. Handles multiple devices through ID matching while also adding an Arc to each metric to make them thread safe. Also adds some minor tests replicating the ones in the net devices folder to check for thread safety and aggregation functionality.

Also resolves all former references to metrics within the vsock and rng objects to a reference to the "global" id set to arbitrarily point to the single required vsock and rng devices for their respective files.

Also adds a minor fix to the test_tap_name method allowing for regex matching.

Reason

Done to allow unit tests to run in parallel to each other. In production one vsock device and one rng device suffice for use, but with this implementation device metrics will overwrite each other during testing and tests would have to be run sequentially. Therefore, support for a global metric and multiple other metrics during testing is needed to ensure functionality for tests and production. This behavior is simply replicated on both vsock and rng.

test_tap_name changed from prior hardcoded value of tap0 to handle any numerical value after 'tap' when passed in.

Closes #4709

Worked with @gjkeller for this.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

kalyazin

Hi @bstrong04 . Thanks for your contribution!

I left some inline comments regarding the approach.
I also triggered a CI run: https://buildkite.com/firecracker/firecracker-pr/builds/13383#0196b489-4d01-4b39-b984-2e3a363e5745 . It reports some style errors. You can run those yourself locally (please see https://github.com/firecracker-microvm/firecracker/blob/main/tests/README.md).

Worked with @gjkeller for this.

I suggest you split your change in multiple commits, at least the tap test change and rng/vsock changes. You could tag @gjkeller appropriately in those commits via Co-Developed-by, Tested-by, etc.

kalyazin · 2025-05-09T08:39:16Z

src/vmm/src/devices/virtio/vsock/event_handler.rs

@@ -47,37 +47,39 @@ where
    const PROCESS_NOTIFY_BACKEND: u32 = 4;

    pub fn handle_rxq_event(&mut self, evset: EventSet) -> bool {
+        let global = VsockMetricsPerDevice::alloc("global".to_string());


What is the reason for allocating it on every request? We should be allocating it only when we create a new device or when we restore it from a snapshot.

kalyazin · 2025-05-09T08:40:05Z

src/vmm/src/devices/virtio/vsock/metrics.rs

-    seq.serialize_entry("vsock", &METRICS)?;
+    let vsock_metrics = METRICS.read().unwrap();
+    let metrics_len = vsock_metrics.metrics.len();
+    // +1 to accomodate aggregate net metrics


here and below replace net with vsock

kalyazin · 2025-05-09T08:45:11Z

src/vmm/src/devices/virtio/vsock/metrics.rs

+static METRICS: RwLock<VsockMetricsPerDevice> = RwLock::new(VsockMetricsPerDevice {
+    metrics: {
+        let tree = BTreeMap::new();
+        tree.insert(


I guess if we allocate per device structure on device creation, it looks like we don't need the global anymore or am I missing something?

kalyazin · 2025-05-09T08:47:14Z

src/vmm/src/devices/virtio/vsock/metrics.rs

+
+        let json_output = serde_json::to_string(&*METRICS.read().unwrap()).unwrap();
+
+        // Optional: print JSON to visually verify structure


I don't think we need this as we run our tests continuously. If there is something extra to check we should implement those via asserts.

kalyazin · 2025-05-09T08:49:04Z

src/vmm/src/devices/virtio/rng/device.rs

@@ -113,14 +113,15 @@ impl Entropy {
    }

    fn handle_one(&mut self) -> Result<u32, EntropyError> {
+        let global = EntropyMetricsPerDevice::alloc("global".to_string());


please see my comments for the vsock device

JackThomson2 · 2025-06-04T14:31:44Z

Hey @bstrong04 are you planning to continue working on this and taking it to the finish line?

draft

5d08eae

bstrong04 marked this pull request as ready for review May 7, 2025 05:09

kalyazin reviewed May 9, 2025

View reviewed changes

kalyazin added the Status: Awaiting author Indicates that an issue or pull request requires author action label May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixing VMM & RNG per-device metrics. (Minor Tap fix) #5196

Fixing VMM & RNG per-device metrics. (Minor Tap fix) #5196

Uh oh!

bstrong04 commented May 7, 2025 •

edited

Loading

Uh oh!

kalyazin left a comment

Uh oh!

kalyazin May 9, 2025

Uh oh!

kalyazin May 9, 2025

Uh oh!

kalyazin May 9, 2025

Uh oh!

kalyazin May 9, 2025

Uh oh!

kalyazin May 9, 2025

Uh oh!

JackThomson2 commented Jun 4, 2025

Uh oh!

Uh oh!


		let json_output = serde_json::to_string(&*METRICS.read().unwrap()).unwrap();

		// Optional: print JSON to visually verify structure

Fixing VMM & RNG per-device metrics. (Minor Tap fix) #5196

Are you sure you want to change the base?

Fixing VMM & RNG per-device metrics. (Minor Tap fix) #5196

Uh oh!

Conversation

bstrong04 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

kalyazin left a comment

Choose a reason for hiding this comment

Uh oh!

kalyazin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

kalyazin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

kalyazin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

kalyazin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

kalyazin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

JackThomson2 commented Jun 4, 2025

Uh oh!

Uh oh!

bstrong04 commented May 7, 2025 •

edited

Loading