Skip to content

Improve AsyncManualResetEvent implementation to address races #1843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 8, 2025

Conversation

danielmarbach
Copy link
Collaborator

@danielmarbach danielmarbach commented Jun 6, 2025

Proposed Changes

I used the implementation further in some very high concurrent scenarios and ran into token problems with the manual reset implementation under the cover. I have concluded the original version I tweaked had some races that we collectively missed.

The race condition occurs in WaitAsync() where the IsSet check and valueTaskSource.Version capture happen at different moments without synchronization, allowing Set() or Reset() to execute between these operations and change the state. If Reset() is called after the IsSet check passes but before the ValueTask is created, the version becomes stale and the awaited task will never complete because it references the old version while the ManualResetValueTaskSourceCore has been reset. Additionally, the gap between checking IsSet and updating state in both Set() and Reset() creates windows where multiple threads can pass the initial checks simultaneously, leading to operations being performed on inconsistent state.

This implementation passed my concurrency tests, but it doesn't hurt if the original involved reviewers give this another review @lukebakken @paulomorgado @bollhals

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

If this is a relatively large or complex change, kick off the discussion by
explaining why you chose the solution you did and what alternatives you
considered, etc.

@danielmarbach danielmarbach requested a review from lukebakken June 6, 2025 22:21
@danielmarbach danielmarbach changed the title Revert to the original version of AsyncManualResetEvent Improve AsyncManualResetEvent implementation to address races Jun 7, 2025
@danielmarbach
Copy link
Collaborator Author

I pushed an implementation based on https://raw.githubusercontent.com/dotnet/runtime/refs/heads/main/src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/ValueTaskSource.cs which should properly address the races and keep the allocations on par with the current version

@danielmarbach danielmarbach force-pushed the manual-reset branch 2 times, most recently from 26d1b38 to addb51f Compare June 7, 2025 09:55
@danielmarbach
Copy link
Collaborator Author

I reverted the value task source based implementation. The TCS based version reliably works while the value task source based one I have already tried to implement several "fixes" even with assistance I could never get it to be race free under load

@danielmarbach
Copy link
Collaborator Author

If someone wants to go down that level of optimization, be my guest but I have depleted my available experimenting and fiddling around budget I'm willing to spend on this

@lukebakken lukebakken self-assigned this Jun 8, 2025
@lukebakken lukebakken added this to the 7.2.0 milestone Jun 8, 2025
Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@lukebakken lukebakken merged commit a960e25 into rabbitmq:main Jun 8, 2025
19 checks passed
@danielmarbach danielmarbach deleted the manual-reset branch June 8, 2025 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants