Skip to content

fix IovDeque for non 4K pages #5222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 28, 2025
Merged

Conversation

ShadowCurse
Copy link
Contributor

@ShadowCurse ShadowCurse commented May 22, 2025

Changes

The L const generic was determining the maximum number of iov
elements in the IovDeque. This cases the issue when the host kernel
uses pages which can contain more entries than L. For example usual
4K pages can contain 256 iovs while 16K pages can contain 1024 iovs.
Current implementation on 16K (and any other bigger than 4K page size)
will continue wrap IovDeque when it reaches 256'th element. This
breaks the implementation since elements written past 256'th index will
not be 'duplicated' at the beginning of the queue.

Curren implementation expects this behavior:

 page 1 page 2
|ABCD|#|ABCD|
      ^ will wrap here

With big page sizes current impl will:

 page 1              page2
|ABCD|EFGD________|#|ABCDEFGD________|
     ^ sill wrap here
                   ^ but should wrap here

The solution is to calculate the maximum capacity the IovDeque can
hold, and use it for wrapping purposes. This capacity is allowed to be
bigger than L. The actual used number of entries in the queue will
still be guarded by the L parameter used in the is_full method.

Reason

Fixes #5217

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented May 22, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 82.93%. Comparing base (6417786) to head (29df8ea).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/iov_deque.rs 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5222      +/-   ##
==========================================
+ Coverage   82.88%   82.93%   +0.05%     
==========================================
  Files         250      250              
  Lines       26936    26942       +6     
==========================================
+ Hits        22325    22344      +19     
+ Misses       4611     4598      -13     
Flag Coverage Δ
5.10-c5n.metal 83.37% <87.50%> (-0.01%) ⬇️
5.10-m5n.metal 83.36% <87.50%> (-0.01%) ⬇️
5.10-m6a.metal 82.58% <87.50%> (+<0.01%) ⬆️
5.10-m6g.metal 79.20% <87.50%> (+<0.01%) ⬆️
5.10-m6i.metal 83.36% <87.50%> (+<0.01%) ⬆️
5.10-m7a.metal-48xl 82.57% <87.50%> (?)
5.10-m7g.metal 79.20% <87.50%> (+<0.01%) ⬆️
5.10-m7i.metal-24xl 83.33% <87.50%> (?)
5.10-m7i.metal-48xl 83.33% <87.50%> (?)
5.10-m8g.metal-24xl 79.19% <87.50%> (?)
5.10-m8g.metal-48xl 79.19% <87.50%> (?)
6.1-c5n.metal 83.42% <87.50%> (-0.01%) ⬇️
6.1-m5n.metal 83.41% <87.50%> (-0.01%) ⬇️
6.1-m6a.metal 82.63% <87.50%> (+<0.01%) ⬆️
6.1-m6g.metal 79.20% <87.50%> (+<0.01%) ⬆️
6.1-m6i.metal 83.40% <87.50%> (+<0.01%) ⬆️
6.1-m7a.metal-48xl 82.62% <87.50%> (?)
6.1-m7g.metal 79.20% <87.50%> (+<0.01%) ⬆️
6.1-m7i.metal-24xl 83.43% <87.50%> (?)
6.1-m7i.metal-48xl 83.42% <87.50%> (?)
6.1-m8g.metal-24xl 79.19% <87.50%> (?)
6.1-m8g.metal-48xl 79.19% <87.50%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@louwers
Copy link

louwers commented May 22, 2025

Confirmed that this fix seems to solve the problem I reported.

The issue is no longer reproducible.

@ShadowCurse ShadowCurse force-pushed the net_16k_fix branch 3 times, most recently from 5b7c45f to 4ae81dc Compare May 27, 2025 13:19
@ShadowCurse ShadowCurse marked this pull request as ready for review May 27, 2025 13:19
@ShadowCurse ShadowCurse self-assigned this May 27, 2025
@ShadowCurse ShadowCurse added Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Documentation Indicates a need for improvements or additions to documentation Type: Fix Indicates a fix to existing code labels May 27, 2025
Manciukic
Manciukic previously approved these changes May 27, 2025
@Manciukic
Copy link
Contributor

Build is complaining about some markdown formatting

FAILED integration_tests/style/test_markdown.py::test_markdown_style - AssertionError: Some markdown files need formatting. Either run `./tools/devtool sh mdformat .` in the repository root, or apply the above diffs manually.

The `L` const generic was determining the maximum number of `iov`
elements in the `IovDeque`. This cases the issue when the host kernel
uses pages which can contain more entries than `L`. For example usual
4K pages can contain 256 `iov`s while 16K pages can contain 1024 `iov`s.
Current implementation on 16K (and any other bigger than 4K page size)
will continue wrap `IovDeque` when it reaches 256'th element. This
breaks the implementation since elements written past 256'th index will
not be 'duplicated' at the beginning of the queue.

Curren implementation expects this behavior:
 page 1 page 2
|ABCD|#|ABCD|
      ^ will wrap here

With big page sizes current impl will:
 page 1              page2
|ABCD|EFGD________|#|ABCDEFGD________|
     ^ sill wrap here
                   ^ but should wrap here

The solution is to calculate the maximum capacity the `IovDeque` can
hold, and use it for wrapping purposes. This capacity is allowed to be
bigger than `L`. The actual used number of entries in the queue will
still be guarded by the `L` parameter used in the `is_full` method.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Manciukic
Manciukic previously approved these changes May 27, 2025
Add note about `IovDeque` fix for non 4K pages.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Currently only 4K pages on the host and in the guest
are officially supported. Other configurations might work,
but not continuously tested.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse requested a review from Manciukic May 28, 2025 08:18
@roypat roypat merged commit 1d0f9af into firecracker-microvm:main May 28, 2025
6 of 7 checks passed
@ShadowCurse ShadowCurse deleted the net_16k_fix branch May 28, 2025 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Documentation Indicates a need for improvements or additions to documentation Type: Fix Indicates a fix to existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Regression v1.10.0 tap device unreliable and unresponsive
4 participants