Skip to content

Support multiple Azure Key Vault instances as fallback #1433

@JorTurFer

Description

@JorTurFer

Describe the solution you'd like
Yesterday there was an issue in Azure Key Vault service in west europe (probably a maintenance or so, because ALL our vaults were affected, doesn't matter the subscription). The health monitors show something like:
image

Although the service issue isn't reponsibility of this driver, having a plan B to mitigate this would have been nice. In theory, Azure Key Vault is transparently replicated in the paired region with automatic failover in read-only mode, but it didn't happen.

We use multiple regions to be resilient to region failures but currently the secrets-store-csi is a single point of failure as it doesn't support any type of fallback at any level.

Given that, I'd like to propose extending current behavior to support other Azure Key Vaults as failover if the primary instance fails.

Current configuration looks like:

parameters:
    keyvaultName: ......
    tenantId: ......
    useVMManagedIdentity: 'true'
    userAssignedIdentityID: .....
    objects: |
      array:
        - |
          objectName: ...
          objectType: secret

and it could be easily extended with an array of fallback Key Vaults (or just once 🤷 )

parameters:
    keyvaultName: ......
    tenantId: ......
    userAssignedIdentityID: .....
    fallback:
    -  keyvaultName: ......
       tenantId: ......
       userAssignedIdentityID: .....
    objects: |
      array:
        - |
          objectName: ...
          objectType: secret

This approach would improve the resiliency of the component, just doing a fallback to other Azure Key Vault instances if there is any error on the primary instance without disruption the service.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

As csi volumes doesn't support being optional, problems related with the upstream will block pods scheduling (with a chance of huge impact in productive environments if this happens during high load peaks). I've reviewed csi-secret-store documentation and I've not found anything to handle these scenarios, but maybe I've missed something.

Environment:

  • Secrets Store CSI Driver version: (use the image tag): v1.4.0
  • Azure Key Vault provider version: (use the image tag): v1.5..0
  • Kubernetes version: (use kubectl version): 1.27
  • Cluster type: (e.g. AKS, aks-engine, etc): AKS

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions