Skip to content

Fixing VMM & RNG per-device metrics. (Minor Tap fix) #5196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bstrong04
Copy link

@bstrong04 bstrong04 commented May 7, 2025

Changes

Added a new device metric system for both RNG and VMM VIRTIO devices. This includes mapping them through a BTreeMap as mentioned in issue #4145 for the net devices. Handles multiple devices through ID matching while also adding an Arc to each metric to make them thread safe. Also adds some minor tests replicating the ones in the net devices folder to check for thread safety and aggregation functionality.

Also resolves all former references to metrics within the vsock and rng objects to a reference to the "global" id set to arbitrarily point to the single required vsock and rng devices for their respective files.

Also adds a minor fix to the test_tap_name method allowing for regex matching.

Reason

Done to allow unit tests to run in parallel to each other. In production one vsock device and one rng device suffice for use, but with this implementation device metrics will overwrite each other during testing and tests would have to be run sequentially. Therefore, support for a global metric and multiple other metrics during testing is needed to ensure functionality for tests and production. This behavior is simply replicated on both vsock and rng.

test_tap_name changed from prior hardcoded value of tap0 to handle any numerical value after 'tap' when passed in.

Closes #4709

Worked with @gjkeller for this.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@bstrong04 bstrong04 marked this pull request as ready for review May 7, 2025 05:09
Copy link
Contributor

@kalyazin kalyazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bstrong04 . Thanks for your contribution!

I left some inline comments regarding the approach.
I also triggered a CI run: https://buildkite.com/firecracker/firecracker-pr/builds/13383#0196b489-4d01-4b39-b984-2e3a363e5745 . It reports some style errors. You can run those yourself locally (please see https://github.com/firecracker-microvm/firecracker/blob/main/tests/README.md).

Worked with @gjkeller for this.

I suggest you split your change in multiple commits, at least the tap test change and rng/vsock changes. You could tag @gjkeller appropriately in those commits via Co-Developed-by, Tested-by, etc.

@@ -47,37 +47,39 @@ where
const PROCESS_NOTIFY_BACKEND: u32 = 4;

pub fn handle_rxq_event(&mut self, evset: EventSet) -> bool {
let global = VsockMetricsPerDevice::alloc("global".to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for allocating it on every request? We should be allocating it only when we create a new device or when we restore it from a snapshot.

seq.serialize_entry("vsock", &METRICS)?;
let vsock_metrics = METRICS.read().unwrap();
let metrics_len = vsock_metrics.metrics.len();
// +1 to accomodate aggregate net metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and below replace net with vsock

static METRICS: RwLock<VsockMetricsPerDevice> = RwLock::new(VsockMetricsPerDevice {
metrics: {
let tree = BTreeMap::new();
tree.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we allocate per device structure on device creation, it looks like we don't need the global anymore or am I missing something?


let json_output = serde_json::to_string(&*METRICS.read().unwrap()).unwrap();

// Optional: print JSON to visually verify structure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this as we run our tests continuously. If there is something extra to check we should implement those via asserts.

@@ -113,14 +113,15 @@ impl Entropy {
}

fn handle_one(&mut self) -> Result<u32, EntropyError> {
let global = EntropyMetricsPerDevice::alloc("global".to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see my comments for the vsock device

@kalyazin kalyazin added the Status: Awaiting author Indicates that an issue or pull request requires author action label May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting author Indicates that an issue or pull request requires author action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Running vmm Unittests in Parallel
2 participants