DataProtection: long idle period may desync data protection keys, should auto refresh cache when keyring is empty #25350

bachratyg · 2020-08-28T15:15:53Z

bachratyg
Aug 28, 2020

Data Protection usually generates a new key 2 days (hardwired) before the current preferred default key expires and this key is propagated to other nodes that use the same keyring through the 1 day (hardwired) cache refresh. However when there is no valid key the new generated key is a prompt key effective immediately and may not get propagated to other nodes until much later, at the next cache refresh, which could take up to 1 day.

#3975 addresses this issue by forcing a cache refresh when an unknown key is received, but only for a couple of minutes during startup. This solves the problem e.g. when a new app is deployed for the first time with an empty keyring.

It looks like both the cache refresh and the generation of the new key is not proactive but triggered by a data protection operation (e.g. an authenticated request), therefore long periods of idle time may have the same result: on the next request a new prompt key is generated which cannot be seen by other nodes. I have no repro, the hardwired cache expiration/key propagation window makes this really hard to test in practice, this is only my understanding of how stuff works under the hood based on the source code. Am I missing something?

I have a load balanced setup where nodes are configured as always running due to business requirements and only ever stops for maintenance. It is unprobable but cannot be ruled out that the web interface experiences more than 2 days of idle time (e.g. long weekend). If that coincides with a key expiration then the next time activity returns - which usually occurs in bursts therefore splatted across all nodes - the key desync could bring down the whole system.

In case my understanding is correct the forced refresh window should be tied to the same thing that triggers the prompt key creation instead of startup: when an empty keyring (no valid keys) is detected.

There are a couple alternatives.

Configure the propagation interval/cache refresh: these are currently hardwired. I could find a suitable value where there is guaranteed activity within the propagation interval therefore prompt keys would never be created
Create a scheduled ping to the web interface that triggers a DP operation and a timely key creation: this relies on the operators. One slight misconfiguration and then a big surprise 90 days later.
Create a BackgroundService that triggers GetCurrentKeyRing when the last key is about to expire: this is my preferred solution for the time being, still seems rather hackish.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DataProtection: long idle period may desync data protection keys, should auto refresh cache when keyring is empty #25350

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

DataProtection: long idle period may desync data protection keys, should auto refresh cache when keyring is empty #25350

Uh oh!

bachratyg Aug 28, 2020

Replies: 0 comments

bachratyg
Aug 28, 2020