Skip to content

NFS Azure file share snapshots for backup #2471

@cptanalatriste

Description

@cptanalatriste

✅ Checklist

  • I have searched open and closed issues for duplicates.
  • This is a request for a new feature in the Data Safe Haven or an upgrade to an existing feature.
  • The feature is still missing in the latest version.
  • I have read through the documentation.
  • This isn't an open-ended question (open a discussion if it is).

🍓 Suggested change

Sadly, the backup approach currently in the codebase does not work properly (see: #2270), and we were force to disable it (see: #2466). This is in part due to very little out-of-the-box support from Azure Backup to NFS File Shares and BlockBlobStorage.

However, backup is a critical feature, explicitly mentioned by DSPT.

🚂 How could this be done?

SInce January 2024, NFS Azure file share support snapshots (see: https://techcommunity.microsoft.com/blog/azurestorageblog/announcing-the-general-availability-of-nfs-azure-file-share-snapshots/4038596). Although not a traditional backup, it offers some data protection features, like recovering previous versions.

Taking snapshots is accessible only via Portal, PowerShell or CLI (see: https://learn.microsoft.com/en-us/azure/storage/files/storage-snapshots-files?tabs=portal). For a speedy implementation, and reusing some of the infrastructure already in place, I think we can do the following:

  1. Create another Container App Job in the same Container Apps Environment we're currently using for the DNS sidecar (after renaming it to "Management Jobs" or something similar). The advantage is that the networking is already set-up for hosting containers that do Azure CLI requests.
  2. Within the Managed Environment, we create a new Container App Job for taking snapshots. It would be the same minimal base image as the DNS sidecar job, with Azure CLI installed. Using a system managed identity, it would periodically take snapshots of the DSH NFS File Shares: home and shared.
  3. Like with the DNS Sidecar job, we would configure the frequency and timeout on the config file.

Some caveats:

  1. We might need to change the teardown logic. We cannot delete a file share that has snapshots. Or we can leave it as-is, and force users to delete snapshots manually.
  2. We might need to change our workload profile. At the moment, we're only allowing one instance of the cheapest profile available (4vCPU, 16Gi RAM). Its workload would double with an extra job.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew functionality that should be added to the Safe Haven

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions