Skip to content

Change how the key ring cache is updated #54675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 18, 2024
Merged

Conversation

amcasey
Copy link
Member

@amcasey amcasey commented Mar 21, 2024

I wanted to address two concerns:

  1. If there is a cached key ring, but it is expired, the first thread to discover this will synchronously refresh the cache - other threads will continue to use the old value.
  2. If a key ring refresh is forced, it will always hit the backing repository, even if several threads want a refresh at the same time.

With this change, all key ring updates are computed on a thread-pool thread and callers block exactly when there is no cached key ring for them to fall back on (first run, for example) or if they are forcing a refresh (an in-flight refresh is considered satisfactory).

Moving this work to a background thread will give us more room to be generous with retries when (e.g.) Azure KeyVault is unreachable or a file is locked.

The old behavior can be restored using the appcontext switch Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate. This is a safety valve and not part of configuration - we'll remove the switch and the old code path in the next release if nothing blows up.

Micro-benchmarking (on 9.0 on x64) shows that the new version has comparable performance in the common case of finding an unexpired key ring in the cache.

I wanted to address two concerns:

1. If there is a cached key ring, but it is expired, the first thread to discover this will _synchronously_ refresh the cache - other threads will continue to use the old value.
2. If a key ring refresh is forced, it will always hit the backing repository, even if several threads want a refresh at the same time.

With this change, _all_ key ring updates are computed on a thread-pool thread and callers block exactly when there is no cached key ring for them to fall back on (first run, for example) or if they are forcing a refresh (an in-flight refresh is considered satisfactory).

Moving this work to a background thread will give us more room to be generous with retries when (e.g.) Azure KeyVault is unreachable or a file is locked.

The old behavior can be restored using the appcontext switch `Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate`.  This is a safety valve and not part of configuration - we'll remove the switch and the old code path in the next release if nothing blows up.

Micro-benchmarking (on 9.0 on x64) shows that the new version has comparable performance in the common case of finding an unexpired key ring in the cache.
@ghost ghost added the area-dataprotection Includes: DataProtection label Mar 21, 2024
@amcasey amcasey requested a review from captainsafia March 21, 2024 21:55
mockCacheableKeyRingProvider.Setup(o => o.GetCacheableKeyRing(updatedKeyRingTime))
.Returns<DateTimeOffset>(dto =>
{
// at this point we're inside the critical section - spawn the background thread now
Copy link
Member Author

@amcasey amcasey Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was unsalvageable because it depended upon the lambda being evaluated under a particular lock. MultipleThreadsSeeExpiredCachedValue attempts to provide analogous coverage.

@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Mar 29, 2024
Copy link
Member

@captainsafia captainsafia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change seems sound overall! Would definitely want other eyes on all the multi-threaded logic as well.

@amcasey
Copy link
Member Author

amcasey commented Apr 18, 2024

Thanks @captainsafia and @BrennanConroy! This was a complicated one.

@amcasey amcasey enabled auto-merge (squash) April 18, 2024 21:35
@amcasey amcasey merged commit c7aae8f into dotnet:main Apr 18, 2024
26 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the 9.0-preview4 milestone Apr 18, 2024
@amcasey amcasey deleted the AsyncRefresh branch April 19, 2024 00:05
if (newCacheableKeyRing is null)
{
// There will have been a better exception from the winning thread
throw Error.KeyRingProvider_RefreshFailedOnOtherThread();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception must have already happened at this point. Why can't we capture it and throw it from any thread that fails to retrieve the KeyRing due to the exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think it would be helpful to print the underlying exception once per affected task, even if we were to add a note that it was being re-reported from elsewhere.

Copy link
Member

@halter73 halter73 Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's more helpful than throwing a generic exception. We don't have to log it each time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a compromise, I've attached existingTask.Exception as the inner exception.

amcasey added a commit to amcasey/aspnetcore that referenced this pull request Apr 22, 2024
- Take locks earlier since they're going to be taken anyway
- Observe exceptions on background tasks
- Rethrow task exceptions in the standard way
- Remove redundant Volatiles
- Attach an inner exception on losing threads
amcasey added a commit that referenced this pull request Apr 23, 2024
- Take locks earlier since they're going to be taken anyway
- Observe exceptions on background tasks
- Rethrow task exceptions in the standard way
- Remove redundant Volatiles
- Attach an inner exception on losing threads
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-dataprotection Includes: DataProtection pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants