From 309126b32dccd99e5e8257cb300d2b838c3498b6 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:39:47 +0200 Subject: [PATCH 01/15] a --- docs/admin/config/advanced-cluster-config.md | 15 ++++++++++++++ docs/admin/config/org-cert.md | 20 +++++++++++++++++-- .../cluster-setup/cluster-prerequisites.md | 4 ++++ 3 files changed, 37 insertions(+), 2 deletions(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index 9cd58b5e59..2c19c8a744 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -52,6 +52,21 @@ The following configurations allow you to enable or disable features, control pe | spec.runai-scheduler.args.verbosity (int) | Configures the level of detail in the logs generated by the scheduler service | 4 | | pod-grouper.args.gangScheduleArgoWorkflow (boolean) | Groups all pods of a single ArgoWorkflow workload into a single Pod-Group for gang scheduling. | true | +### S3 and Git sidecar images +For airgapped environment when [Working with a Local Certificate Authority](./org-cert.md) it is required to replace the default sidecar images used for S3 and Git integrations edit the following configurations: + +``` +workload-controller: + s3FileSystemImage: + name: goofys + repository: gcr.io/run-ai-prod + tag: master + gitSyncImage: + name: git-sync + repository: egistry.k8s.io + tag: v4.4.0 +``` + ### Run:ai Managed Nodes diff --git a/docs/admin/config/org-cert.md b/docs/admin/config/org-cert.md index dea2cac6a5..c8ad6174e1 100644 --- a/docs/admin/config/org-cert.md +++ b/docs/admin/config/org-cert.md @@ -9,7 +9,7 @@ In the context of Run:ai, the cluster and control-plane need to be aware of this You will need to have the public key of the local certificate authority. -## Control-Plane Installation +## Control-Plane * Create the `runai-backend` namespace if it does not exist. * Add the public key to the `runai-backend` namespace: @@ -21,7 +21,7 @@ kubectl -n runai-backend create secret generic runai-ca-cert \ * As part of the installation instructions, you need to create a secret for [runai-backend-tls](../runai-setup/self-hosted/k8s/preparations.md#domain-certificate). Use the local certificate authority instead. * Install the control plane, add the following flag to the helm command `--set global.customCA.enabled=true` -## Cluster Installation +## Cluster * Create the `runai` namespace if it does not exist. * Add the public key to the `runai` namespace: @@ -37,5 +37,21 @@ kubectl -n openshift-monitoring create secret generic runai-ca-cert \ * Install the Run:ai operator, add the following flag to the helm command `--set global.customCA.enabled=true` +### Git and S3 +Run:ai enables AI practitioners to integrate with S3 or Git as data sources. +When using a custom CA, sidecar containers used for S3 or Git integrations do not automatically inherit the CA configured at the cluster level. This requires manually building a custom container for each integration based on the default Run:ai image while incorporating the local CA certificates. +1. Use the Dockerfile below to build images for the S3 / Git integrations +``` +#FROM gcr.io/run-ai-prod/goofys:master # S3 +#FROM registry.k8s.io/git-sync/git-sync:v4.4.0 # Git +USER root +ADD anchors/ /usr/local/share/ca-certificates/ +RUN chmod 644 -R /usr/local/share/ca-certificates/ && update-ca-certificates +WORKDIR / +ENTRYPOINT ["sh"] +CMD ["/usr/bin/run.sh"] +``` +2. Push the images to your local registry +2. Edit the cluster configuration's for images used by Run:ai following the [S3 and Git sidecar images](./advanced-cluster-config.md#s3-and-git-sidecar-images) instructions. diff --git a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md index 2e5d3f6301..476c13696e 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md @@ -50,6 +50,10 @@ The following software requirements must be fulfilled on the Kubernetes cluster. Run:ai cluster on Oracle Kubernetes Engine (OKE) supports only Ubuntu. * Internal tests are being performed on **Ubuntu 22.04** and **CoreOS** for OpenShift. +### Container runtime + +Kubernetes must be configured with the Docker container runtime. Other container runtimes, such as containerd or CRI-O, are not supported. + ### Kubernetes distribution Run:ai Cluster requires Kubernetes. The following Kubernetes distributions are supported: From dac9e9f9b33819acf1345ac0baee9afad4fdf659 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:42:22 +0200 Subject: [PATCH 02/15] a --- docs/admin/config/advanced-cluster-config.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index 2c19c8a744..9e0502101d 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -53,8 +53,7 @@ The following configurations allow you to enable or disable features, control pe | pod-grouper.args.gangScheduleArgoWorkflow (boolean) | Groups all pods of a single ArgoWorkflow workload into a single Pod-Group for gang scheduling. | true | ### S3 and Git sidecar images -For airgapped environment when [Working with a Local Certificate Authority](./org-cert.md) it is required to replace the default sidecar images used for S3 and Git integrations edit the following configurations: - +For air-gapped environments, when [Working with a Local Certificate Authority](./org-cert.md), you must replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations: ``` workload-controller: s3FileSystemImage: From 7b12b4286be55eb8316a6dd34cb6de57faf38e29 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:43:18 +0200 Subject: [PATCH 03/15] a --- docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md index 476c13696e..2e5d3f6301 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md @@ -50,10 +50,6 @@ The following software requirements must be fulfilled on the Kubernetes cluster. Run:ai cluster on Oracle Kubernetes Engine (OKE) supports only Ubuntu. * Internal tests are being performed on **Ubuntu 22.04** and **CoreOS** for OpenShift. -### Container runtime - -Kubernetes must be configured with the Docker container runtime. Other container runtimes, such as containerd or CRI-O, are not supported. - ### Kubernetes distribution Run:ai Cluster requires Kubernetes. The following Kubernetes distributions are supported: From 044ff2b0dfe3a034478e44250914993db5faa8c1 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:43:19 +0200 Subject: [PATCH 04/15] a From 77abe224a66453d320eb392eacc91bc3c607df17 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:50:47 +0200 Subject: [PATCH 05/15] a --- docs/admin/config/org-cert.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/admin/config/org-cert.md b/docs/admin/config/org-cert.md index c8ad6174e1..008b09e173 100644 --- a/docs/admin/config/org-cert.md +++ b/docs/admin/config/org-cert.md @@ -41,7 +41,7 @@ kubectl -n openshift-monitoring create secret generic runai-ca-cert \ Run:ai enables AI practitioners to integrate with S3 or Git as data sources. When using a custom CA, sidecar containers used for S3 or Git integrations do not automatically inherit the CA configured at the cluster level. This requires manually building a custom container for each integration based on the default Run:ai image while incorporating the local CA certificates. -1. Use the Dockerfile below to build images for the S3 / Git integrations +1. [build tag and publish](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image/) the images for the S3 / Git integrations: ``` #FROM gcr.io/run-ai-prod/goofys:master # S3 #FROM registry.k8s.io/git-sync/git-sync:v4.4.0 # Git @@ -52,6 +52,5 @@ WORKDIR / ENTRYPOINT ["sh"] CMD ["/usr/bin/run.sh"] ``` -2. Push the images to your local registry 2. Edit the cluster configuration's for images used by Run:ai following the [S3 and Git sidecar images](./advanced-cluster-config.md#s3-and-git-sidecar-images) instructions. From 52bf9affba341dd3e00e6fe5f34444fe6664b6fb Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:51:14 +0200 Subject: [PATCH 06/15] a --- docs/admin/config/org-cert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/config/org-cert.md b/docs/admin/config/org-cert.md index 008b09e173..f6419ef5e5 100644 --- a/docs/admin/config/org-cert.md +++ b/docs/admin/config/org-cert.md @@ -41,7 +41,7 @@ kubectl -n openshift-monitoring create secret generic runai-ca-cert \ Run:ai enables AI practitioners to integrate with S3 or Git as data sources. When using a custom CA, sidecar containers used for S3 or Git integrations do not automatically inherit the CA configured at the cluster level. This requires manually building a custom container for each integration based on the default Run:ai image while incorporating the local CA certificates. -1. [build tag and publish](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image/) the images for the S3 / Git integrations: +1. [build tag and publish](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image/) the images for the S3 / Git integrations using the following Dockerfile: ``` #FROM gcr.io/run-ai-prod/goofys:master # S3 #FROM registry.k8s.io/git-sync/git-sync:v4.4.0 # Git From 6c9ad6281397920b180ebb6aa2f1715248936909 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:52:43 +0200 Subject: [PATCH 07/15] a --- docs/admin/config/advanced-cluster-config.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index 9e0502101d..e7f8145d51 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -53,7 +53,7 @@ The following configurations allow you to enable or disable features, control pe | pod-grouper.args.gangScheduleArgoWorkflow (boolean) | Groups all pods of a single ArgoWorkflow workload into a single Pod-Group for gang scheduling. | true | ### S3 and Git sidecar images -For air-gapped environments, when [Working with a Local Certificate Authority](./org-cert.md), you must replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations: +For air-gapped environments, when [Working with a Local Certificate Authority](./org-cert.md), it is required to replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations: ``` workload-controller: s3FileSystemImage: From 71726b1e6d32e5a84952af01d5d56fcdbed4f811 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:53:09 +0200 Subject: [PATCH 08/15] fix --- docs/admin/config/advanced-cluster-config.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index e7f8145d51..a5dd35ff22 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -62,7 +62,7 @@ workload-controller: tag: master gitSyncImage: name: git-sync - repository: egistry.k8s.io + repository: registry.k8s.io tag: v4.4.0 ``` From 7cf7412e9a3a1acad88c68f57786ebd505ea08f1 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Wed, 26 Feb 2025 17:53:20 +0200 Subject: [PATCH 09/15] a --- docs/admin/config/advanced-cluster-config.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index a5dd35ff22..db89ff126f 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -66,7 +66,6 @@ workload-controller: tag: v4.4.0 ``` - ### Run:ai Managed Nodes To include or exclude specific nodes from running workloads within a cluster managed by Run:ai, use the `nodeSelectorTerms` flag. For additional details, see [Kubernetes nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector). From 9428ae29a18e7c764b539d8afd39543886ee292e Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Thu, 27 Feb 2025 10:35:49 +0200 Subject: [PATCH 10/15] Update docs/admin/config/advanced-cluster-config.md Co-authored-by: camielRunai <148046035+camielRunai@users.noreply.github.com> --- docs/admin/config/advanced-cluster-config.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index db89ff126f..9b5538de25 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -58,11 +58,11 @@ For air-gapped environments, when [Working with a Local Certificate Authority](. workload-controller: s3FileSystemImage: name: goofys - repository: gcr.io/run-ai-prod - tag: master + registry: runai.jfrog.io/op-containers-prod + tag: 3.12.24 gitSyncImage: name: git-sync - repository: registry.k8s.io + registry: registry.k8s.io tag: v4.4.0 ``` From 89610ca64f7de09f4aede4bf4e00931e9d2b11d9 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Thu, 27 Feb 2025 10:44:33 +0200 Subject: [PATCH 11/15] a --- docs/admin/config/advanced-cluster-config.md | 2 +- docs/admin/config/org-cert.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index 9b5538de25..9bba00baa2 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -53,7 +53,7 @@ The following configurations allow you to enable or disable features, control pe | pod-grouper.args.gangScheduleArgoWorkflow (boolean) | Groups all pods of a single ArgoWorkflow workload into a single Pod-Group for gang scheduling. | true | ### S3 and Git sidecar images -For air-gapped environments, when [Working with a Local Certificate Authority](./org-cert.md), it is required to replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations: +For air-gapped environments, when [working with a Local Certificate Authority](./org-cert.md), it is required to replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations: ``` workload-controller: s3FileSystemImage: diff --git a/docs/admin/config/org-cert.md b/docs/admin/config/org-cert.md index f6419ef5e5..78cc1eff6c 100644 --- a/docs/admin/config/org-cert.md +++ b/docs/admin/config/org-cert.md @@ -41,7 +41,7 @@ kubectl -n openshift-monitoring create secret generic runai-ca-cert \ Run:ai enables AI practitioners to integrate with S3 or Git as data sources. When using a custom CA, sidecar containers used for S3 or Git integrations do not automatically inherit the CA configured at the cluster level. This requires manually building a custom container for each integration based on the default Run:ai image while incorporating the local CA certificates. -1. [build tag and publish](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image/) the images for the S3 / Git integrations using the following Dockerfile: +1. [Build tag and publish](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image/) the images for the S3 / Git integrations using the following Dockerfile: ``` #FROM gcr.io/run-ai-prod/goofys:master # S3 #FROM registry.k8s.io/git-sync/git-sync:v4.4.0 # Git @@ -52,5 +52,5 @@ WORKDIR / ENTRYPOINT ["sh"] CMD ["/usr/bin/run.sh"] ``` -2. Edit the cluster configuration's for images used by Run:ai following the [S3 and Git sidecar images](./advanced-cluster-config.md#s3-and-git-sidecar-images) instructions. +2. Edit the cluster configurations for images used by Run:ai following the [S3 and Git sidecar images](./advanced-cluster-config.md#s3-and-git-sidecar-images) instructions. From c8b9dfcd01c7735cfd44f2d63e315c4ee6895ef6 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Thu, 27 Feb 2025 10:45:59 +0200 Subject: [PATCH 12/15] a --- docs/admin/config/org-cert.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/admin/config/org-cert.md b/docs/admin/config/org-cert.md index 78cc1eff6c..581baa59bc 100644 --- a/docs/admin/config/org-cert.md +++ b/docs/admin/config/org-cert.md @@ -9,7 +9,7 @@ In the context of Run:ai, the cluster and control-plane need to be aware of this You will need to have the public key of the local certificate authority. -## Control-Plane +## Control-Plane Installation * Create the `runai-backend` namespace if it does not exist. * Add the public key to the `runai-backend` namespace: @@ -21,7 +21,7 @@ kubectl -n runai-backend create secret generic runai-ca-cert \ * As part of the installation instructions, you need to create a secret for [runai-backend-tls](../runai-setup/self-hosted/k8s/preparations.md#domain-certificate). Use the local certificate authority instead. * Install the control plane, add the following flag to the helm command `--set global.customCA.enabled=true` -## Cluster +## Cluster Installation * Create the `runai` namespace if it does not exist. * Add the public key to the `runai` namespace: From 9360ce8e412812bf47fd676b1c764efc17f39adc Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Thu, 27 Feb 2025 10:59:47 +0200 Subject: [PATCH 13/15] a --- docs/admin/config/org-cert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/config/org-cert.md b/docs/admin/config/org-cert.md index 581baa59bc..1a2313e517 100644 --- a/docs/admin/config/org-cert.md +++ b/docs/admin/config/org-cert.md @@ -46,7 +46,7 @@ When using a custom CA, sidecar containers used for S3 or Git integrations do no #FROM gcr.io/run-ai-prod/goofys:master # S3 #FROM registry.k8s.io/git-sync/git-sync:v4.4.0 # Git USER root -ADD anchors/ /usr/local/share/ca-certificates/ +ADD /usr/local/share/ca-certificates/ # example: anchors/ RUN chmod 644 -R /usr/local/share/ca-certificates/ && update-ca-certificates WORKDIR / ENTRYPOINT ["sh"] From 265daedfe4d7e09a4051d85fd6cd8b327a910e81 Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Thu, 27 Feb 2025 14:27:00 +0200 Subject: [PATCH 14/15] a --- docs/admin/config/advanced-cluster-config.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index 9bba00baa2..23f84c202f 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -55,15 +55,16 @@ The following configurations allow you to enable or disable features, control pe ### S3 and Git sidecar images For air-gapped environments, when [working with a Local Certificate Authority](./org-cert.md), it is required to replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations: ``` -workload-controller: - s3FileSystemImage: - name: goofys - registry: runai.jfrog.io/op-containers-prod - tag: 3.12.24 - gitSyncImage: - name: git-sync - registry: registry.k8s.io - tag: v4.4.0 +spec: + workload-controller: + s3FileSystemImage: + name: goofys + registry: runai.jfrog.io/op-containers-prod + tag: 3.12.24 + gitSyncImage: + name: git-sync + registry: registry.k8s.io + tag: v4.4.0 ``` ### Run:ai Managed Nodes From a9feb8d253ca256b36bcecf72f403766cbf81cfe Mon Sep 17 00:00:00 2001 From: ozRunAI <138565115+ozRunAI@users.noreply.github.com> Date: Thu, 27 Feb 2025 14:27:40 +0200 Subject: [PATCH 15/15] a --- docs/admin/config/advanced-cluster-config.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md index 23f84c202f..2e63d62c29 100644 --- a/docs/admin/config/advanced-cluster-config.md +++ b/docs/admin/config/advanced-cluster-config.md @@ -54,7 +54,8 @@ The following configurations allow you to enable or disable features, control pe ### S3 and Git sidecar images For air-gapped environments, when [working with a Local Certificate Authority](./org-cert.md), it is required to replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations: -``` + +``` yaml spec: workload-controller: s3FileSystemImage: @@ -79,7 +80,7 @@ Label the nodes using the below: The below example shows how to include NVIDIA GPUs only and exclude all other GPU types in a cluster with mixed nodes, based on product type GPU label: -``` bash +``` yaml spec: global: managedNodes: