diff --git a/docs/admin/runai-setup/cluster-setup/cluster-delete.md b/docs/admin/runai-setup/cluster-setup/cluster-delete.md index 9914eae887..672f79a5c3 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-delete.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-delete.md @@ -1,9 +1,23 @@ # Deleting a Cluster Installation -To delete a Run:ai Cluster installation run the following commands: +To delete a Run:ai Cluster installation while retaining existing running jobs, run the following commands: +=== "Version 2.9 or later" ``` helm uninstall runai-cluster -n runai ``` -The command will **not** delete existing Projects, Departments, or Workloads submitted by users. +=== "Version 2.8" +``` +kubectl delete RunaiConfig runai -n runai +helm uninstall runai-cluster -n runai +``` + +=== "Version 2.7 or earlier" +``` +kubectl patch RunaiConfig runai -n runai -p '{"metadata":{"finalizers":[]}}' --type="merge" +kubectl delete RunaiConfig runai -n runai +helm uninstall runai-cluster runai -n runai +``` + +The commands will **not** delete existing Jobs submitted by users. diff --git a/docs/admin/runai-setup/cluster-setup/cluster-install.md b/docs/admin/runai-setup/cluster-setup/cluster-install.md index 65bd231d1b..9a7fa3aa4d 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-install.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-install.md @@ -17,15 +17,13 @@ Using the cluster wizard: * Choose a name for your cluster. * Choose the Run:ai version for the cluster. -* (v2.13 only) Choose a target Kubernetes distribution. - +* Choose a target Kubernetes distribution (see [table](cluster-prerequisites.md#kubernetes) for supported distributions). * (SaaS and remote self-hosted cluster only) Enter a URL for the Kubernetes cluster. The URL need only be accessible within the organization's network. For more informtaion see [here](cluster-prerequisites.md#cluster-url). * Press `Continue`. On the next page: -* (SaaS and remote self-hosted cluster only) Install a [trusted certificate](cluster-prerequisites.md#cluster-url) to the domain entered above. - +* (SaaS and remote self-hosted cluster only) Install a trusted certificate to the domain entered above. * Run the [Helm](https://helm.sh/docs/intro/install/) command provided in the wizard. * In case of a failure, see the [Installation troubleshooting guide](../../troubleshooting/troubleshooting.md#installation). diff --git a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md index 2267294227..c4781c6560 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md @@ -56,7 +56,10 @@ Following is a Kubernetes support matrix for the latest Run:ai releases:` + +## Upgrade Run:ai cluster + +### Upgrade from version 2.15+ +* In the Run:ai interface, navigate to `Clusters`. +* Select the cluster you want to upgrade. +* Click on `Get Installation instructions`. +* Choose the `Run:ai version` to be installed on the Cluster. +* Press `Continue`. +* Copy the [Helm](https://helm.sh/docs/intro/install/) command provided in the `Installation Instructions` and run it on in the cluster. +* In the case of a failure, refer to the [Installation troubleshooting guide](../../troubleshooting/troubleshooting.md#installation). + +### Upgrade from version 2.9, 2.10, 2.11 or 2.12 +Run: + +``` +helm get values runai-cluster -n runai > old-values.yaml +``` + +1. Review the file `old-values.yaml` and see if there are any changes performed during the last installation. +2. Follow the instructions for [Installing Run:ai](cluster-install.md#install-runai) to download a new values file. +3. Merge the changes from Step 1 into the new values file. +4. Run `helm upgrade` as per the instructions in the link above. + + +!!! Note + To upgrade to a __specific__ version of the Run:ai cluster, add `--version ` to the `helm upgrade` command. You can find the relevant version with `helm search repo` as described above. + +### Upgrade from version 2.7 or 2.8 + +The process of upgrading from 2.7 or 2.8 requires [uninstalling](./cluster-delete.md) and then [installing](./cluster-install.md) again. No data is lost during the process. + +!!! Note + The reason for this process is that Run:ai 2.9 cluster installation no longer installs pre-requisites. As such ownership of dependencies such as Prometheus will be undefined if a `helm upgrade` is run. + +The process: + +* Delete the Run:ai cluster installation according to these [instructions](cluster-delete.md) (do not delete the Run:ai cluster __object__ from the user interface). +* The following commands should be executed __after__ running the helm uninstall command ``` - helm get values runai-cluster -n runai > old-values.yaml - ``` + kubectl -n runai delete all --all + kubectl -n runai delete cm --all + kubectl -n runai delete secret --all + kubectl -n runai delete roles --all + kubectl -n runai delete rolebindings --all + kubectl -n runai delete ingress --all + kubectl -n runai delete servicemonitors --all + kubectl -n runai delete podmonitors --all + kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io -l app=runai + kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io -l app=runai + kubectl delete svc -n kube-system runai-cluster-kube-prometh-kubelet + ``` +* Install the mandatory Run:ai [prerequisites](cluster-prerequisites.md): + * If you have previously installed the SaaS version of Run:ai version 2.7 or below, you will need to install both [Ingress Controller](cluster-prerequisites.md#ingress-controller) and [Prometheus](cluster-prerequisites.md#prometheus). + * If you have previously installed the SaaS version of Run:ai version 2.8 or any Self-hosted version of Run:ai, you will need to install [Prometheus](cluster-prerequisites.md#prometheus) only. + + +* Install Run:ai cluster as described [here](cluster-install.md) + +## Verify Successful Installation - * Review the file `old-values.yaml` and see if there are any changes performed during the last installation. - * In the Run:ai interface, navigate to `Clusters`. - * Select the cluster you want to upgrade. - * Click on `Get Installation instructions`. - * Select `Run:ai version: 2.13`. - * Select the `cluster's Kubernetes distribution` and the `Cluster location` - * If the Cluster locaiton is remote to the control plane - Enter a URL for the Kubernetes cluster. - * Press `Continue`. - * Follow the instructions to download a new values file. - * Merge the changes from Step 1 into the new values file. - * Copy the [Helm](https://helm.sh/docs/intro/install/) command provided in the `Installation Instructions` and run it on in the cluster. - -## Verify Successful Upgrade See [Verify your installation](cluster-install.md#verify-your-clusters-health) on how to verify a Run:ai cluster installation diff --git a/docs/admin/runai-setup/self-hosted/k8s/upgrade.md b/docs/admin/runai-setup/self-hosted/k8s/upgrade.md index 6a2b28ab1a..2bae1b4b07 100644 --- a/docs/admin/runai-setup/self-hosted/k8s/upgrade.md +++ b/docs/admin/runai-setup/self-hosted/k8s/upgrade.md @@ -36,7 +36,33 @@ If you are installing an air-gapped version of Run:ai, The Run:ai tar file conta Before proceeding with the upgrade, it's crucial to apply the specific prerequisites associated with your current version of Run:ai and every version in between up to the version you are upgrading to. -### Upgrade from version 2.9 +### Upgrade from version 2.7 or 2.8 + +Before upgrading the control plane, run: + +``` bash +POSTGRES_PV=$(kubectl get pvc pvc-postgresql -n runai-backend -o jsonpath='{.spec.volumeName}') +THANOS_PV=$(kubectl get pvc pvc-thanos-receive -n runai-backend -o jsonpath='{.spec.volumeName}') +kubectl patch pv $POSTGRES_PV $THANOS_PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' + +kubectl delete secret -n runai-backend runai-backend-postgresql +kubectl delete sts -n runai-backend keycloak runai-backend-postgresql +``` + +Before version 2.9, the Run:ai installation, by default, included NGINX. It was possible to disable this installation. If NGINX is enabled in your current installation, as per the default, run the following 2 lines: + +``` bash +kubectl delete ValidatingWebhookConfiguration runai-backend-nginx-ingress-admission +kubectl delete ingressclass nginx +``` +(If Run:ai configuration has previously disabled NGINX installation then these lines should not be run). + +Next, install NGINX as described [here](../../cluster-setup/cluster-prerequisites.md#ingress-controller) + +Then create a TLS secret and upgrade the control plane as described in the [control plane installation](backend.md). Before upgrading, find customizations and merge them as discussed below. + + +### Upgrade from version 2.9, 2.10 , or 2.11 Two significant changes to the control-plane installation have happened with version 2.12: _PVC ownership_ and _installation customization_. @@ -55,6 +81,7 @@ To remove the ownership in an older installation, run: kubectl patch pvc -n runai-backend pvc-thanos-receive -p '{"metadata": {"annotations":{"helm.sh/resource-policy": "keep"}}}' kubectl patch pvc -n runai-backend pvc-postgresql -p '{"metadata": {"annotations":{"helm.sh/resource-policy": "keep"}}}' ``` + #### Ingress Delete the ingress object which will be recreated by the control plane upgrade @@ -71,7 +98,9 @@ The Run:ai control-plane installation has been rewritten and is no longer using ## Upgrade Control Plane + ### Upgrade from version 2.13, or later + === "Connected" ``` bash @@ -84,7 +113,9 @@ The Run:ai control-plane installation has been rewritten and is no longer using helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml helm upgrade runai-backend control-plane-.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-values ``` -### Upgrade from version 2.9 + +### Upgrade from version 2.7, 2.8, 2.9, or 2.11 + * Create a `tls secret` as described in the [control plane installation](backend.md). * Upgrade the control plane as described in the [control plane installation](backend.md). During the upgrade, you must tell the installation __not__ to create the two PVCs: @@ -111,4 +142,4 @@ The Run:ai control-plane installation has been rewritten and is no longer using ## Upgrade Cluster -To upgrade the cluster follow the instructions [here](../../cluster-setup/cluster-upgrade.md). \ No newline at end of file +To upgrade the cluster follow the instructions [here](../../cluster-setup/cluster-upgrade.md). diff --git a/docs/admin/runai-setup/self-hosted/ocp/upgrade.md b/docs/admin/runai-setup/self-hosted/ocp/upgrade.md index 35d8b89575..f12e8a9564 100644 --- a/docs/admin/runai-setup/self-hosted/ocp/upgrade.md +++ b/docs/admin/runai-setup/self-hosted/ocp/upgrade.md @@ -2,12 +2,15 @@ title: Upgrade self-hosted OpenShift installation --- # Upgrade Run:ai + + !!! Important Run:ai data is stored in Kubernetes persistent volumes (PVs). Prior to Run:ai 2.12, PVs are owned by the Run:ai installation. Thus, uninstalling the `runai-backend` helm chart may delete all of your data. From version 2.12 forward, PVs are owned the customer and are independent of the Run:ai installation. As such, they are subject to storage class [reclaim](https://kubernetes.io/docs/concepts/storage/storage-classes/#reclaim-policy){target=_blank} policy. ## Preparations + ### Helm Run:ai requires [Helm](https://helm.sh/){target=_blank} 3.14 or later. Before you continue, validate your installed helm client version. @@ -15,6 +18,7 @@ To install or upgrade Helm, see [Installing Helm](https://helm.sh/docs/intro/ins If you are installing an air-gapped version of Run:ai, The Run:ai tar file contains the helm binary. ### Software files + === "Connected" Run the helm command below: @@ -28,8 +32,11 @@ If you are installing an air-gapped version of Run:ai, The Run:ai tar file conta * Upload the images as described [here](preparations.md#software-artifacts). ## Before upgrade + Before proceeding with the upgrade, it's crucial to apply the specific prerequisites associated with your current version of Run:ai and every version in between up to the version you are upgrading to. +### Upgrade from version 2.7 or 2.8 + Before upgrading the control plane, run: ``` bash @@ -42,10 +49,11 @@ kubectl delete sts -n runai-backend keycloak runai-backend-postgresql Then upgrade the control plane as described [below](#upgrade-control-plane). Before upgrading, find customizations and merge them as discussed below. -### Upgrade from version 2.9 -Two significant changes to the control-plane installation have happened with version 2.12: _PVC ownership_ and _installation customization_. +### Upgrade from version 2.9, 2.10 or 2.11 +Two significant changes to the control-plane installation have happened with version 2.12: _PVC ownership_ and _installation customization_. #### PVC ownership + Run:ai will no longer directly create the PVCs that store Run:ai data (metrics and database). Instead, going forward, * Run:ai requires a Kubernetes storage class to be installed. @@ -60,13 +68,17 @@ kubectl patch pvc -n runai-backend pvc-postgresql -p '{"metadata": {"annotation ``` #### Installation customization + The Run:ai control-plane installation has been rewritten and is no longer using a _backend values file_. Instead, to customize the installation use standard `--set` flags. If you have previously customized the installation, you must now extract these customizations and add them as `--set` flag to the helm installation: * Find previous customizations to the control plane if such exist. Run:ai provides a utility for that here `https://raw.githubusercontent.com/run-ai/docs/v2.13/install/backend/cp-helm-vals-diff.sh`. For information on how to use this utility please contact Run:ai customer support. * Search for the customizations you found in the [optional configurations](./backend.md#additional-runai-configurations-optional) table and add them in the new format. + ## Upgrade Control Plane + ### Upgrade from version 2.13, or later + === "Connected" ``` bash @@ -79,7 +91,8 @@ The Run:ai control-plane installation has been rewritten and is no longer using helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml helm upgrade runai-backend control-plane-.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-values ``` -### Upgrade from version 2.9 + +### Upgrade from version 2.7, 2.8, 2.9, or 2.11 === "Connected" @@ -111,4 +124,5 @@ The Run:ai control-plane installation has been rewritten and is no longer using 1. The subdomain configured for the OpenShift cluster. ## Upgrade Cluster -To upgrade the cluster follow the instructions [here](../../cluster-setup/cluster-upgrade.md). \ No newline at end of file + +To upgrade the cluster follow the instructions [here](../../cluster-setup/cluster-upgrade.md).