Skip to content

Commit 2e65a45

Browse files
Merge pull request #709 from jasonnovichRunAI/v2.17
V2.17
2 parents fd1439b + 848b047 commit 2e65a45

File tree

7 files changed

+224
-201
lines changed

7 files changed

+224
-201
lines changed
Lines changed: 18 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,14 @@
11

22
# Install the Run:ai Control Plane
33

4+
## Prerequisites and preperations
45

5-
## Domain certificate
6+
Make sure you have followed the Control Plane [prerequisites](./prerequisites.md) and [preperations](./preperations.md).
67

7-
You must provide the [domain's](prerequisites.md#domain-name) private key and crt as a Kubernetes secret in the `runai-backend` namespace. Run:
8-
9-
```
10-
kubectl create secret tls runai-backend-tls -n runai-backend \
11-
--cert /path/to/fullchain.pem --key /path/to/private.pem
12-
```
138
## Install the Control Plane
149

1510
Run the helm command below:
1611

17-
1812
=== "Connected"
1913
``` bash
2014
helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
@@ -31,24 +25,22 @@ Run the helm command below:
3125
=== "Airgapped"
3226
``` bash
3327
helm upgrade -i runai-backend control-plane-<VERSION>.tgz \ # (1)
34-
--set global.domain=<DOMAIN> # (2)
35-
-n runai-backend -f custom-env.yaml # (3)
28+
--set global.domain=<DOMAIN> \ # (2)
29+
--set global.customCA.enabled=true \ # (3)
30+
-n runai-backend -f custom-env.yaml # (4)
3631
```
3732

3833
1. Replace `<VERSION>` with the Run:ai control plane version.
3934
2. Domain name described [here](prerequisites.md#domain-name).
40-
3. `custom-env.yaml` should have been created by the _prepare installation_ script in the previous section.
35+
3. See the Local Certificate Authority instructions below
36+
4. `custom-env.yaml` should have been created by the _prepare installation_ script in the previous section.
4137

4238
!!! Tip
4339
Use the `--dry-run` flag to gain an understanding of what is being installed before the actual installation.
4440

45-
## (Air-gapped only) Local Certificate Authority
46-
47-
Perform the instructions for [local certificate authority](../../config/org-cert.md).
4841

4942

50-
## (Optional) Additional Configurations
51-
43+
### Additional configurations (optional)
5244
There may be cases where you need to set additional properties as follows:
5345

5446
| Key | Change | Description |
@@ -63,29 +55,29 @@ There may be cases where you need to set additional properties as follows:
6355
| `grafana.adminUser` | Grafana username | Override the Run:ai default user name for accessing Grafana |
6456
| `grafana.adminPassword` | Grafana password | Override the Run:ai default password for accessing Grafana |
6557
| `thanos.receive.persistence.storageClass` and `postgresql.primary.persistence.storageClass` | Storage class | The installation to work with a specific storage class rather than the default one |
66-
| `global.imagePullSecrets:` <br> &ensp; `- name: <secret-name>` | Docker secret | Provide credentials for accessing the organization's docker registry. This is required for air-gapped environments |
6758
| `<component>` <br> &ensp;`resources:` <br> &emsp; `limits:` <br> &emsp; &ensp; `cpu: 500m` <br> &emsp; &ensp; `memory: 512Mi` <br> &emsp; `requests:` <br> &emsp; &ensp; `cpu: 250m` <br> &emsp; &ensp; `memory: 256Mi` | Pod request and limits | `<component>` may be anyone of the following: `backend`, `frontend`, `assetsService`, `identityManager`, `tenantsManager`, `keycloakx`, `grafana`, `authorization`, `orgUnitService`,`policyService` |
6859
|<div style="width:200px"></div>| | |
6960

70-
71-
72-
7361
Use the `--set` syntax in the helm command above.
7462

75-
### Connect to Run:ai User Interface
63+
## Next Steps
7664

77-
Go to: `runai.<company-name>`. Log in using the default credentials: User: `test@run.ai`, Password: `Abcd!234`. Go to the Users area and change the password.
65+
### Connect to Run:ai User interface
7866

67+
Go to: `runai.<domain>`. Log in using the default credentials: User: `test@run.ai`, Password: `Abcd!234`. Go to the Users area and change the password.
7968

80-
## (Optional) Enable "Forgot password"
69+
### Enable Forgot Password (optional)
8170

82-
To support the Forgot password functionality, follow the steps below.
71+
To support the *Forgot password* functionality, follow the steps below.
8372

84-
* Go to `runai.<company-name>/auth` and Log in.
73+
* Go to `runai.<domain>/auth` and Log in.
8574
* Under `Realm settings`, select the `Login` tab and enable the `Forgot password` feature.
8675
* Under the `Email` tab, define an SMTP server, as explained [here](https://www.keycloak.org/docs/latest/server_admin/#_email){target=_blank}
8776

88-
## Next Steps
8977

78+
### Install Run:ai Cluster
9079
Continue with installing a [Run:ai Cluster](cluster.md).
9180

81+
82+
83+
Lines changed: 67 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,44 @@
11
---
2-
title: Self-Hosted Installation over Kubernetes - Preparations
2+
title: Self Hosted installation over Kubernetes - preparations
33
---
4+
# Preparing for a Run:ai Kubernetes installation
45

5-
## Prerequisites
6+
The following section provides IT with the information needed to prepare for a Run:ai installation.
67

7-
See the Prerequisites section [above](prerequisites.md).
8+
## Prerequisites
89

10+
Follow the prerequisites as explained in [Self-Hosted installation over Kubernetes](prerequisites.md).
911

10-
## Prepare Installation Artifacts
11-
12-
### Run:ai Software Files
13-
14-
SSH into a node with `kubectl` access to the cluster and `Docker` installed.
15-
12+
## Software artifacts
1613

1714
=== "Connected"
15+
You should receive a file: `runai-gcr-secret.yaml` from Run:ai Customer Support. The file provides access to the Run:ai Container registry.
16+
17+
SSH into a node with `kubectl` access to the cluster and `Docker` installed.
1818
Run the following to enable image download from the Run:ai Container Registry on Google cloud:
1919

2020
``` bash
2121
kubectl create namespace runai-backend
22-
kubectl apply -f runai-gcr-secret.yaml
22+
kubectl apply -f runai-reg-creds.yaml
2323
```
2424

25-
=== "Airgapped"
25+
=== "Airgapped"
26+
You should receive a single file `runai-air-gapped-<version>.tar.gz` from Run:ai customer support
27+
28+
SSH into a node with `kubectl` access to the cluster and `Docker` installed.
29+
30+
Run:ai assumes the existence of a Docker registry for images. Most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as `<REGISTRY_URL>`).
31+
2632
To extract Run:ai files, replace `<VERSION>` in the command below and run:
2733

2834
``` bash
29-
tar xvf runai-air-gapped-<version>.tar.gz
35+
tar xvf runai-air-gapped-<VERSION>.tar.gz
3036
cd deploy
3137

3238
kubectl create namespace runai-backend
3339
```
3440

35-
__Upload images__
41+
**Upload images**
3642

3743
Upload images to a local Docker Registry. Set the Docker Registry address in the form of `NAME:PORT` (do not add `https`):
3844

@@ -50,25 +56,67 @@ SSH into a node with `kubectl` access to the cluster and `Docker` installed.
5056

5157
The script should create a file named `custom-env.yaml` which will be used by the control-plane installation.
5258

59+
### Private Docker Registry (optional)
60+
61+
To access the organization's docker registry it is required to set the registry's credentials (imagePullSecret)
62+
63+
Create the secret named `runai-reg-creds` based on your existing credentials. For more information, see [Allowing pods to reference images from other secured registries](https://docs.openshift.com/container-platform/latest/openshift_images/managing_images/using-image-pull-secrets.html#images-allow-pods-to-reference-images-from-secure-registries_using-image-pull-secrets){target=_blank}.
64+
65+
## Configure your environment
66+
67+
### Domain Certificate
68+
69+
The Run:ai control plane requires a domain name (FQDN). You must supply a domain name as well as a trusted certificate for that domain.
70+
71+
* When installing the first Run:ai cluster on the same Kubernetes cluster as the control plane, the Run:ai cluster URL will be the same as the control-plane URL.
72+
* When installing the Run:ai cluster on a separate Kubernetes cluster, follow the Run:ai [domain name](../../cluster-setup/cluster-prerequisites.md#cluster-url) requirements.
73+
* If your network is air-gapped, you will need to provide the Run:ai control-plane and cluster with information about the [local certificate authority](../../config/org-cert.md).
74+
75+
You must provide the domain's private key and crt as a Kubernetes secret in the `runai-backend` namespace. Run:
5376

54-
## (Optional) Mark Run:ai System Workers
77+
```
78+
kubectl create secret tls runai-backend-tls -n runai-backend \
79+
--cert /path/to/fullchain.pem --key /path/to/private.pem
80+
```
81+
### Local Certificate Authority (air-gapped only)
82+
83+
In air-gapped environments, you must prepare the public key of your local certificate authority as described [here](../../config/org-cert.md). It will need to be installed in Kubernetes for the installation to succeed.
84+
85+
### Mark Run:ai system workers (optional)
5586

56-
You can __optionally__ set the Run:ai control plane to run on specific nodes. Kubernetes will attempt to schedule Run:ai pods to these nodes. If lacking resources, the Run:ai nodes will move to another, non-labeled node.
87+
You can **optionally** set the Run:ai control plane to run on specific nodes. Kubernetes will attempt to schedule Run:ai pods to these nodes. If lacking resources, the Run:ai nodes will move to another, non-labeled node.
5788

5889
To set system worker nodes run:
5990

6091
```
6192
kubectl label node <NODE-NAME> node-role.kubernetes.io/runai-system=true
6293
```
63-
94+
6495
!!! Warning
6596
Do not select the Kubernetes master as a `runai-system` node. This may cause Kubernetes to stop working (specifically if Kubernetes API Server is configured on 443 instead of the default 6443).
6697

67-
## Additional Permissions
98+
## Additional permissions
99+
100+
As part of the installation, you will be required to install the [Run:ai Control Plane](backend.md) and [Cluster](cluster.md) Helm [Charts](https://helm.sh/){target=_blank}. The Helm Charts require Kubernetes administrator permissions. You can review the exact permissions provided by using the `--dry-run` on both helm charts.
101+
102+
## Validate Prerequisites
103+
104+
Once you believe that the Run:ai prerequisites and preperations are met, we highly recommend installing and running the Run:ai [pre-install diagnostics script](https://github.com/run-ai/preinstall-diagnostics){target=_blank}. The tool:
105+
106+
* Tests the below requirements as well as additional failure points related to Kubernetes, NVIDIA, storage, and networking.
107+
* Looks at additional components installed and analyze their relevance to a successful Run:ai installation.
108+
109+
To use the script [download](https://github.com/run-ai/preinstall-diagnostics/releases){target=_blank} the latest version of the script and run:
110+
111+
```
112+
chmod +x preinstall-diagnostics-<platform>
113+
./preinstall-diagnostics-<platform> --domain <dns-entry>
114+
```
68115

69-
As part of the installation, you will be required to install the [Run:ai Control Plane](backend.md) and [Cluster](cluster.md) Helm [Charts](https://helm.sh/){target=_blank}. The Helm Charts require Kubernetes administrator permissions. You can review the exact permissions provided by using the `--dry-run` on both helm charts.
116+
If the script fails, or if the script succeeds but the Kubernetes system contains components other than Run:ai, locate the file `runai-preinstall-diagnostics.txt` in the current directory and send it to Run:ai technical support.
70117

118+
For more information on the script including additional command-line flags, see [here](https://github.com/run-ai/preinstall-diagnostics){target=_blank}.
71119

72-
## Next Steps
120+
## Next steps
73121

74122
Continue with installing the [Run:ai Control Plane](backend.md).
Lines changed: 31 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,41 @@
1-
-title: Self-Hosted installation over Kubernetes - Prerequisites
2-
---
1+
# Self-Hosted installation over Kubernetes - Prerequisites
32

43
Before proceeding with this document, please review the [installation types](../../installation-types.md) documentation to understand the difference between _air-gapped_ and _connected_ installations.
54

6-
7-
## Control-plane and clusters
5+
## Run:ai Components
86

97
As part of the installation process you will install:
108

119
* A control-plane managing cluster
12-
* One or more Run:ai clusters
10+
* One or more clusters
1311

1412
Both the control plane and clusters require Kubernetes. Typically the control plane and first cluster are installed on the same Kubernetes cluster but this is not a must.
1513

16-
## Hardware Requirements
14+
!!! Important
15+
In OpenShift environments, adding a cluster connecting to a __remote__ control plane currently requires the assistance of customer support.
1716

18-
See Cluster prerequisites [hardware](../../cluster-setup/cluster-prerequisites.md#hardware-requirements) requirements.
17+
## Installer machine
1918

20-
In addition, the control plane installation of Run:ai requires the configuration of Kubernetes Persistent Volumes of a total size of 110GB.
19+
The machine running the installation script (typically the Kubernetes master) must have:
20+
21+
* At least 50GB of free space.
22+
* Docker installed.
23+
24+
25+
### Helm
2126

22-
## Run:ai Software
27+
Run:ai requires [Helm](https://helm.sh/){target=_blank} 3.10 or later. To install Helm, see [Installing Helm](https://helm.sh/docs/intro/install/){target=_blank}. If you are installing an air-gapped version of Run:ai, The Run:ai tar file contains the helm binary.
2328

24-
=== "Connected"
25-
You should receive a file: `runai-gcr-secret.yaml` from Run:ai Customer Support. The file provides access to the Run:ai Container registry.
29+
## Cluster hardware requirements
2630

27-
=== "Airgapped"
28-
You should receive a single file `runai-air-gapped-<version>.tar.gz` from Run:ai customer support
31+
See Cluster prerequisites [hardware](../../cluster-setup/cluster-prerequisites.md#hardware-requirements) requirements.
2932

30-
## Run:ai Software Prerequisites
33+
In addition, the control plane installation of Run:ai requires the configuration of Kubernetes Persistent Volumes of a total size of 110GB.
3134

32-
### Operating System
35+
36+
## Run:ai software requirements
37+
38+
### Operating system
3339

3440
See Run:ai Cluster prerequisites [operating system](../../cluster-setup/cluster-prerequisites.md#operating-system) requirements.
3541

@@ -53,77 +59,31 @@ The Run:ai control-plane requires a __default storage class__ to create persiste
5359
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
5460
```
5561

62+
## Install prerequisites
5663

57-
### (Air-gapped only) Local Certificate Authority
58-
59-
In Air-gapped environments, you must prepare the public key of your local certificate authority as described [here](../../config/org-cert.md). It will need to be installed in Kubernetes for the installation to succeed.
64+
### Ingress Controller
6065

66+
The Run:ai control plane installation assumes an existing installation of NGINX as the ingress controller. You can follow the Run:ai _Cluster_ prerequisites [ingress controller](../../cluster-setup/cluster-prerequisites.md#ingress-controller) installation.
6167

62-
### NVIDIA Prerequisites
68+
### NVIDIA GPU Operator
6369

6470
See Run:ai Cluster prerequisites [NVIDIA](../../cluster-setup/cluster-prerequisites.md#nvidia) requirements.
6571

6672
The Run:ai control plane, when installed without a Run:ai cluster, does not require the NVIDIA prerequisites.
6773

68-
### Prometheus Prerequisites
74+
### Prometheus
6975

7076
See Run:ai Cluster prerequisites [Prometheus](../../cluster-setup/cluster-prerequisites.md#prometheus) requirements.
7177

7278
The Run:ai control plane, when installed without a Run:ai cluster, does not require the Prometheus prerequisites.
7379

74-
### (Optional) Inference Prerequisites
80+
81+
### Inference (optional)
7582

7683
See Run:ai Cluster prerequisites [Inference](../../cluster-setup/cluster-prerequisites.md#inference) requirements.
7784

7885
The Run:ai control plane, when installed without a Run:ai cluster, does not require the Inference prerequisites.
7986

80-
### Helm
81-
82-
Run:ai requires [Helm](https://helm.sh/){target=_blank} 3.10 or later. To install Helm, see [https://helm.sh/docs/intro/install/](https://helm.sh/docs/intro/install/){target=_blank}. If you are installing an air-gapped version of Run:ai, The Run:ai tar file contains the helm binary.
83-
84-
85-
## Network Requirements
86-
87-
### Ingress Controller
88-
89-
The Run:ai control plane installation assumes an existing installation of NGINX as the ingress controller. You can follow the Run:ai _Cluster_ prerequisites [ingress controller](../../cluster-setup/cluster-prerequisites.md#ingress-controller) installation.
90-
91-
### Domain name
92-
93-
The Run:ai control plane requires a domain name (FQDN). You must supply a domain name as well as a trusted certificate for that domain.
94-
95-
* When installing the first Run:ai cluster on the same Kubernetes cluster as the control plane, the Run:ai cluster URL will be the same as the control-plane URL.
96-
* When installing the Run:ai cluster on a separate Kubernetes cluster, follow the Run:ai [domain name](../../cluster-setup/cluster-prerequisites.md#cluster-url) requirements.
97-
* If your network is air-gapped, you will need to provide the Run:ai control-plane and cluster with information about the [local certificate authority](../../config/org-cert.md).
98-
99-
## Installer Machine
100-
101-
The machine running the installation script (typically the Kubernetes master) must have:
102-
103-
* At least 50GB of free space.
104-
* Docker installed.
105-
106-
## Other
107-
108-
* (Airgapped installation only) __Private Docker Registry__. Run:ai assumes the existence of a Docker registry for images. Most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as `<REGISTRY_URL>`).
109-
* (Optional) __SAML Integration__ as described under [single sign-on](../../authentication/sso.md).
110-
111-
112-
## Pre-install Script
113-
114-
Once you believe that the Run:ai prerequisites are met, we highly recommend installing and running the Run:ai [pre-install diagnostics script](https://github.com/run-ai/preinstall-diagnostics){target=_blank}. The tool:
115-
116-
* Tests the below requirements as well as additional failure points related to Kubernetes, NVIDIA, storage, and networking.
117-
* Looks at additional components installed and analyze their relevance to a successful Run:ai installation.
118-
119-
To use the script [download](https://github.com/run-ai/preinstall-diagnostics/releases){target=_blank} the latest version of the script and run:
120-
121-
```
122-
chmod +x preinstall-diagnostics-<platform>
123-
./preinstall-diagnostics-<platform> --domain <dns-entry>
124-
```
125-
126-
If the script fails, or if the script succeeds but the Kubernetes system contains components other than Run:ai, locate the file `runai-preinstall-diagnostics.txt` in the current directory and send it to Run:ai technical support.
127-
128-
For more information on the script including additional command-line flags, see [here](https://github.com/run-ai/preinstall-diagnostics){target=_blank}.
129-
87+
## Next steps
88+
Continue to [Preparing for a Run:ai Kubernetes Installation
89+
](./preparations.md).

docs/admin/runai-setup/self-hosted/k8s/upgrade.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,16 @@ title: Upgrade self-hosted Kubernetes installation
2020

2121
## Upgrade Control Plane
2222

23+
### Upgrade from Version 2.9, 2.13, 2.15 or 2.16
24+
25+
Before upgrading the control plane, run:
26+
27+
``` bash
28+
POSTGRES_PV=$(kubectl get pvc pvc-postgresql -n runai-backend -o jsonpath='{.spec.volumeName}')
29+
THANOS_PV=$(kubectl get pvc pvc-thanos-receive -n runai-backend -o jsonpath='{.spec.volumeName}')
30+
kubectl patch pv $POSTGRES_PV $THANOS_PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
31+
```
32+
2333
### Upgrade from Version 2.7 or 2.8
2434

2535
Before upgrading the control plane, run:

0 commit comments

Comments
 (0)