Change how the key ring cache is updated #54675

amcasey · 2024-03-21T21:55:13Z

I wanted to address two concerns:

If there is a cached key ring, but it is expired, the first thread to discover this will synchronously refresh the cache - other threads will continue to use the old value.
If a key ring refresh is forced, it will always hit the backing repository, even if several threads want a refresh at the same time.

With this change, all key ring updates are computed on a thread-pool thread and callers block exactly when there is no cached key ring for them to fall back on (first run, for example) or if they are forcing a refresh (an in-flight refresh is considered satisfactory).

Moving this work to a background thread will give us more room to be generous with retries when (e.g.) Azure KeyVault is unreachable or a file is locked.

The old behavior can be restored using the appcontext switch Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate. This is a safety valve and not part of configuration - we'll remove the switch and the old code path in the next release if nothing blows up.

Micro-benchmarking (on 9.0 on x64) shows that the new version has comparable performance in the common case of finding an unexpired key ring in the cache.

I wanted to address two concerns: 1. If there is a cached key ring, but it is expired, the first thread to discover this will _synchronously_ refresh the cache - other threads will continue to use the old value. 2. If a key ring refresh is forced, it will always hit the backing repository, even if several threads want a refresh at the same time. With this change, _all_ key ring updates are computed on a thread-pool thread and callers block exactly when there is no cached key ring for them to fall back on (first run, for example) or if they are forcing a refresh (an in-flight refresh is considered satisfactory). Moving this work to a background thread will give us more room to be generous with retries when (e.g.) Azure KeyVault is unreachable or a file is locked. The old behavior can be restored using the appcontext switch `Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate`. This is a safety valve and not part of configuration - we'll remove the switch and the old code path in the next release if nothing blows up. Micro-benchmarking (on 9.0 on x64) shows that the new version has comparable performance in the common case of finding an unexpired key ring in the cache.

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs

amcasey · 2024-03-21T22:17:22Z

...tection/test/Microsoft.AspNetCore.DataProtection.Tests/KeyManagement/KeyRingProviderTests.cs

-        mockCacheableKeyRingProvider.Setup(o => o.GetCacheableKeyRing(updatedKeyRingTime))
-            .Returns<DateTimeOffset>(dto =>
-            {
-                // at this point we're inside the critical section - spawn the background thread now


This test was unsalvageable because it depended upon the lambda being evaluated under a particular lock. MultipleThreadsSeeExpiredCachedValue attempts to provide analogous coverage.

captainsafia

Change seems sound overall! Would definitely want other eyes on all the multi-threaded logic as well.

src/DataProtection/DataProtection/src/Resources.resx

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs

amcasey · 2024-04-18T21:35:40Z

Thanks @captainsafia and @BrennanConroy! This was a complicated one.

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs

halter73 · 2024-04-22T18:25:03Z

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs

+            if (newCacheableKeyRing is null)
+            {
+                // There will have been a better exception from the winning thread
+                throw Error.KeyRingProvider_RefreshFailedOnOtherThread();


The exception must have already happened at this point. Why can't we capture it and throw it from any thread that fails to retrieve the KeyRing due to the exception?

I didn't think it would be helpful to print the underlying exception once per affected task, even if we were to add a note that it was being re-reported from elsewhere.

I still think it's more helpful than throwing a generic exception. We don't have to log it each time.

As a compromise, I've attached existingTask.Exception as the inner exception.

- Take locks earlier since they're going to be taken anyway - Observe exceptions on background tasks - Rethrow task exceptions in the standard way - Remove redundant Volatiles - Attach an inner exception on losing threads

ghost added the area-dataprotection Includes: DataProtection label Mar 21, 2024

amcasey requested a review from captainsafia March 21, 2024 21:55

amcasey mentioned this pull request Mar 21, 2024

Limit parallelism of forced key ring refreshes #54627

Closed

amcasey commented Mar 21, 2024

View reviewed changes

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Show resolved Hide resolved

amcasey commented Mar 21, 2024

View reviewed changes

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Show resolved Hide resolved

amcasey commented Mar 21, 2024

View reviewed changes

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Show resolved Hide resolved

amcasey commented Mar 21, 2024

View reviewed changes

amcasey mentioned this pull request Mar 23, 2024

Allow retries in DefaultKeyResolver.CanCreateAuthenticatedEncryptor #54711

Merged

dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Mar 29, 2024

captainsafia approved these changes Apr 15, 2024

View reviewed changes

amcasey added 2 commits April 18, 2024 11:56

Add explanatory comments

37f28f8

Clarify that the background task can't be cancelled

de61ffd

BrennanConroy approved these changes Apr 18, 2024

View reviewed changes

src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Outdated Show resolved Hide resolved

Clarify comment

fb4f415

amcasey enabled auto-merge (squash) April 18, 2024 21:35

amcasey merged commit c7aae8f into dotnet:main Apr 18, 2024

dotnet-policy-service bot added this to the 9.0-preview4 milestone Apr 18, 2024

amcasey deleted the AsyncRefresh branch April 19, 2024 00:05

halter73 reviewed Apr 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change how the key ring cache is updated #54675

Change how the key ring cache is updated #54675

Uh oh!

amcasey commented Mar 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amcasey Mar 21, 2024 •

edited

Loading

Uh oh!

captainsafia left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amcasey commented Apr 18, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

halter73 Apr 22, 2024

Uh oh!

amcasey Apr 22, 2024

Uh oh!

halter73 Apr 22, 2024 •

edited

Loading

Uh oh!

amcasey Apr 22, 2024

Uh oh!

Uh oh!

Change how the key ring cache is updated #54675

Change how the key ring cache is updated #54675

Uh oh!

Conversation

amcasey commented Mar 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amcasey Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

captainsafia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amcasey commented Apr 18, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

halter73 Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

amcasey Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

halter73 Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amcasey Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amcasey Mar 21, 2024 •

edited

Loading

halter73 Apr 22, 2024 •

edited

Loading