-
Notifications
You must be signed in to change notification settings - Fork 29
Updates to EC disaster recovery + related KOTS snapshots docs updates #2916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for replicated-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
✅ Deploy Preview for replicated-docs-upgrade ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -1,6 +1,6 @@ | |||
import NodeAgentMemLimit from "../partials/snapshots/_node-agent-mem-limit.mdx" | |||
|
|||
# Troubleshooting Backup and Restore | |||
# Troubleshooting Snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ Updated various topic titles so that it's easier to see at a glance which ones apply to snapshots and which ones apply to EC DR
@@ -98,74 +93,76 @@ The following Velero fields are supported for full backups, as shown in the prev | |||
<td>(Optional) Specifies the actions to perform at different times during a backup. The only supported hook is executing a command in a container in a pod (uses the pod exec API). Supports <code>pre</code> and <code>post</code> hooks.</td> | |||
</tr> | |||
<tr> | |||
<td><code>resources</code></td> | |||
<td><code>hooks.resources</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ updated the table per Ethan's feedback that it was hard to see how the given fields are nested
|
||
1. You must specify which Pod volumes you want backed up. This is done with the `backup.velero.io/backup-volumes` annotation. For more information, see [File System Backup](https://velero.io/docs/v1.14/file-system-backup/) in the Velero documentation. | ||
1. In a new release containing your application files, add a Velero Backup resource. In the Backup resource, use namespace-based or label-based selection to indicate the application resources that you want to be included in the backup. For more information, see [Backup API Type](https://velero.io/docs/latest/api-types/backup/) in the Velero documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ clarify that both namespaces and labels can be used
# the name of the Backup resource that you added | ||
backupName: backup | ||
includedNamespaces: | ||
- '*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ I also pulled this Restore resource from what Ethan had provided in the story.
Wasn't sure if the includedNamespaces field could be removed (according to the velero docs: "If unspecified, all namespaces are included.")
@@ -96,11 +124,17 @@ To enable disaster recovery for a customer: | |||
|
|||
When your customer installs with Embedded Cluster, Velero will be deployed if the **Allow Disaster Recovery** license field is enabled. | |||
|
|||
## Configure Backup Storage and Take Backups in the Admin Console | |||
## Take Backups and Restore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ added this new h2 to nest the procedures for setting up backup storage and restoring from a backup. Felt nice to break up the topic into separate subsections that amount to: "configuring and enabling the feature" and "using the feature"
@@ -328,16 +328,6 @@ const sidebars = { | |||
'vendor/packaging-air-gap-excluding-minio', | |||
], | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited the snapshots info in the sidebar so that both the vendor info about end user info are in the same section to make it all easier to find
|
||
* The disaster recovery feature flag must be enabled for your account. To get access to disaster recovery, reach out to Alex Parker at [alexp@replicated.com](mailto:alexp@replicated.com). | ||
* Embedded Cluster version 1.4.1 or later | ||
* Embedded Cluster version **X.X.X** or later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ is there a new version that we should put here now that this topic talks about the new "add a Restore resource" method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.22.0
|
||
* If the `--admin-console-port` flag was used during install to change the port for the Admin Console, note that during a restore the Admin Console port will be used from the backup and cannot be changed. For more information, see [Embedded Cluster Install Command Options](/reference/embedded-cluster-install). | ||
|
||
## Configure Disaster Recovery for Your Application | ||
## Configure Disaster Recovery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ split this section into 2 subsections: configure the velero resources and then enable dr for customers
@@ -34,44 +36,68 @@ Embedded Cluster disaster recovery has the following limitations and known issue | |||
|
|||
[View a larger version of this image](/images/ec-version-command.png) | |||
|
|||
* You can only restore from the most recent backup. | |||
* Any Helm extensions included in the `extensions` field of the Embedded Cluster Config are _not_ included in backups. Helm extensions are reinstalled as part of the restore process. To include Helm extensions in backups, configure the Velero Backup resource to include the extensions using namespace-based or label-based selection. For more information, see [Configure the Velero Custom Resources](#config-velero-resources) below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ add limitation that extensions aren't included
Not sure if we want to include this part:
To include Helm extensions in backups, configure the Velero Backup resource to include the extensions using namespace-based or label-based selection.
orLabelSelectors: | ||
- matchExpressions: | ||
# Exclude Replicated resources from the backup | ||
- { key: kots.io/kotsadm, operator: NotIn, values: ["true"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ this is the example that Ethan had provided in the related eng story
1. You must specify which Pod volumes you want backed up. This is done with the `backup.velero.io/backup-volumes` annotation. For more information, see [File System Backup](https://velero.io/docs/v1.14/file-system-backup/) in the Velero documentation. | ||
:::important | ||
If you use namespace-based selection to include all of your application resources deployed in the `kotsadm` namespace, ensure that you exclude the Replicated resources that are also deployed in the `kotsadm` namespace. Because the Embedded Cluster infrastructure components are always included in backups automatically, this avoids duplication. | ||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ give people a heads up that they need to exclude replicated resource if they use namespace-based selection.
I put "duplication" as the reasoning here, but there might be a better way to explain it
Disaster recovery for Embedded Cluster installations is implemented with Velero. For more information about Velero, see the [Velero](https://velero.io/docs/v1.14/) documentation. | ||
The backups that your customers take from the Admin Console will include both the Embedded Cluster infrastructure and the application resources that you specify. | ||
|
||
The Embedded Cluster infrastructure that is backed up includes components such as the KOTS Admin Console and the built-in registry that is deployed for air gap installations. No configuration is required to include Embedded Cluster infrastructure in backups. Vendors specify the application resources to include in backups by configuring a Velero Backup resource in the application release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ updated the overview to explain what's included in the backups
|
||
# About Backing Up and Restoring with Snapshots | ||
# About Backup and Restore with Snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ note: previously, we had a vendor snapshots overview and an enterprise user snapshot overview as two different topics. I grouped that content together under this single overview topic:
docs/enterprise/snapshots-understanding.mdx
+ docs/vendor/snapshots-overview.md
→ docs/vendor/snapshots-overview.mdx
Everything here was copy and pasted from that existing content
|
||
[[redirects]] | ||
from="https://docs.replicated.com/enterprise/snapshots-understanding" | ||
to="https://docs.replicated.com/vendor/snapshots-overview" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ redirect the deleted topic
|
||
1. Configure backups for each volume that requires a backup. By default, no volumes are included in the backup. If any pods mount a volume that should be backed up, you must configure the backup with an annotation listing the specific volumes to include in the backup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configure backups for each volume that requires a backup.
In the rewrite, I removed this sentence because it doesn't seem to have meaning. Instead, I jumped right to saying how you need to add the annotation to volumes that must be backed up
|
||
``` | ||
1. (Optional) Configure the resources annotation in the manifest so that it can be dynamically enabled based on a license field or a config option. For more information, see [Including Optional and Conditional Resources](packaging-include-resources/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ This might be something that vendor want to do, but as is, this step fell flat to me. It introduces a totally separate piece of information that doesn't really strictly have to do with the stated task at hand, which is configuring snapshots. I removed this step and figure vendors could go to the section about conditionally deploying resources when they feel like they want to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good point
<details> | ||
<summary>Why do I need to use the backup annotation?</summary> | ||
<p>By default, no volumes are included in the backup. If any pods mount a volume that should be backed up, you must configure the backup with an annotation listing the specific volumes to include in the backup.</p> | ||
</details> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ moved the Why? info to a dropdown
1. (kURL Only) If your application supports installation with Replicated kURL, Replicated recommends that you include the kURL Velero add-on so that customers do not have to manually install Velero in the kURL cluster. For more information, see [Creating a kURL Installer](packaging-embedded-kubernetes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ changed this step from "Optional" to "kURL Only"
|
||
:::note | ||
If you are using multiple applications, repeat this procedure for each application. Every application must have its own Backup resource to be included in a full backup. | ||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ in the rewrite, I moved this note down to the last step of the procedure
@@ -1,34 +1,41 @@ | |||
# Configuring Backups | |||
# Configuring Snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ some rewrite to this topic for clarity
|
||
* The disaster recovery feature flag must be enabled for your account. To get access to disaster recovery, reach out to Alex Parker at [alexp@replicated.com](mailto:alexp@replicated.com). | ||
* Embedded Cluster version 1.4.1 or later | ||
* Embedded Cluster version **X.X.X** or later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.22.0
|
||
``` | ||
1. (Optional) Configure the resources annotation in the manifest so that it can be dynamically enabled based on a license field or a config option. For more information, see [Including Optional and Conditional Resources](packaging-include-resources/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good point
|
||
To configure disaster recovery for your application: | ||
### Configure the Velero Custom Resources {#config-velero-resources} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the KOTS snapshots docs mention a few things like adding annotations to volumes you want backed up, etc. I believe a lot of that is still applicable here, and while our advice here is more general ("just create the needed backup and restore resources"), we might to consider adding in those points to help guide people too.
We can merge as is, but we could follow up with Diamon to see if he agrees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since he onboarded Progress to DR, it might be good for him to read this stuff to see if he would add any more advice or steps in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll do that. (Merge then ask him to read through the DR steps. Might be easier to review the procedure that way compared to seeing all these inline changes)
Co-authored-by: Alex Parker <7272359+ajp-io@users.noreply.github.com>
Updated the Embedded Cluster DR topic to explain the new steps to add the Backup and Restore resources: https://deploy-preview-2916--replicated-docs.netlify.app/vendor/embedded-disaster-recovery
Updated the table in the Velero Backup resource topic for Snapshots to better show how the fields are nested (per Ethan's request): https://deploy-preview-2916--replicated-docs.netlify.app/reference/custom-resource-backup
Also reorged the Snapshots docs so all the vendor and end user info for snapshots is grouped together in a single section under KOTS:
^ As part of this reorg, I also did the following:
* Made sure that the snapshots topics all included "Snapshots" in their titles to avoid confusion with the Embedded Cluster disaster recovery stuff if you are searching the docs
* Condensed the vendor snapshots overview and enterprise user snapshots overview topics into a single topic (we no longer need two separate topics for this now that it's all grouped in the same sidebar section)
Did some rewriting in the existing Configuring Snapshots topic for clarity. Mostly sentence-level changes as this topic hadn't been touched in awhile and the language was a little wordy/confusing in places (more details in the comments of the Files changes tab): https://deploy-preview-2916--replicated-docs.netlify.app/vendor/snapshots-configuring-backups