-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Change how the key ring cache is updated #54675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wanted to address two concerns: 1. If there is a cached key ring, but it is expired, the first thread to discover this will _synchronously_ refresh the cache - other threads will continue to use the old value. 2. If a key ring refresh is forced, it will always hit the backing repository, even if several threads want a refresh at the same time. With this change, _all_ key ring updates are computed on a thread-pool thread and callers block exactly when there is no cached key ring for them to fall back on (first run, for example) or if they are forcing a refresh (an in-flight refresh is considered satisfactory). Moving this work to a background thread will give us more room to be generous with retries when (e.g.) Azure KeyVault is unreachable or a file is locked. The old behavior can be restored using the appcontext switch `Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate`. This is a safety valve and not part of configuration - we'll remove the switch and the old code path in the next release if nothing blows up. Micro-benchmarking (on 9.0 on x64) shows that the new version has comparable performance in the common case of finding an unexpired key ring in the cache.
mockCacheableKeyRingProvider.Setup(o => o.GetCacheableKeyRing(updatedKeyRingTime)) | ||
.Returns<DateTimeOffset>(dto => | ||
{ | ||
// at this point we're inside the critical section - spawn the background thread now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was unsalvageable because it depended upon the lambda being evaluated under a particular lock. MultipleThreadsSeeExpiredCachedValue
attempts to provide analogous coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change seems sound overall! Would definitely want other eyes on all the multi-threaded logic as well.
src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs
Outdated
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs
Outdated
Show resolved
Hide resolved
Thanks @captainsafia and @BrennanConroy! This was a complicated one. |
if (newCacheableKeyRing is null) | ||
{ | ||
// There will have been a better exception from the winning thread | ||
throw Error.KeyRingProvider_RefreshFailedOnOtherThread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception must have already happened at this point. Why can't we capture it and throw it from any thread that fails to retrieve the KeyRing due to the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think it would be helpful to print the underlying exception once per affected task, even if we were to add a note that it was being re-reported from elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's more helpful than throwing a generic exception. We don't have to log it each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a compromise, I've attached existingTask.Exception
as the inner exception.
- Take locks earlier since they're going to be taken anyway - Observe exceptions on background tasks - Rethrow task exceptions in the standard way - Remove redundant Volatiles - Attach an inner exception on losing threads
I wanted to address two concerns:
With this change, all key ring updates are computed on a thread-pool thread and callers block exactly when there is no cached key ring for them to fall back on (first run, for example) or if they are forcing a refresh (an in-flight refresh is considered satisfactory).
Moving this work to a background thread will give us more room to be generous with retries when (e.g.) Azure KeyVault is unreachable or a file is locked.
The old behavior can be restored using the appcontext switch
Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate
. This is a safety valve and not part of configuration - we'll remove the switch and the old code path in the next release if nothing blows up.Micro-benchmarking (on 9.0 on x64) shows that the new version has comparable performance in the common case of finding an unexpired key ring in the cache.