From f3b590af35d7830b96368a5800747c33696c8f16 Mon Sep 17 00:00:00 2001 From: jasonnovichRunAI <124490127+jasonnovichRunAI@users.noreply.github.com> Date: Mon, 22 Jul 2024 11:46:48 +0300 Subject: [PATCH 01/10] Update whats-new-2-18.md --- docs/home/whats-new-2-18.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/whats-new-2-18.md b/docs/home/whats-new-2-18.md index ec555741a0..b8da997a35 100644 --- a/docs/home/whats-new-2-18.md +++ b/docs/home/whats-new-2-18.md @@ -26,7 +26,7 @@ date: 2024-June-14 * Added latency metric for autoscaling. This feature allows automatic scale-up/down the number of replicas of a Run:ai inference workload based on the threshold set by the ML Engineer. This ensures that response time is kept under the target SLA. -* Improved autoscaling for inference models by taking out ChatBot UI from models images. By moving ChatBot UI to predefined *Environments*, autoscaling is more accurate by taking into account all types of requests (API, and ChatBot UI). Adding a ChatBot UI environment preset by Run:ai allows AI practitioners to easily connect them to workloads. +* Improved autoscaling for inference models by taking out ChatBot UI from models images. By moving ChatBot UI to predefined *Environments*, autoscaling is more accurate by taking into account all types of requests (API, and ChatBot UI). Adding a ChatBot UI environment preset by Run:ai allows AI practitioners to easily connect them to workloads. * Added more precision to trigger auto-scaling to zero. Now users can configure a precise consecutive idle threshold custom setting to trigger Run:ai inference workloads to scale-to-zero. From fd48a07b81898e28332f10eab65f552aa250b814 Mon Sep 17 00:00:00 2001 From: JamieWeider72 <147967555+JamieWeider72@users.noreply.github.com> Date: Tue, 23 Jul 2024 13:14:29 +0300 Subject: [PATCH 02/10] Update hotfixes-2-16.md --- docs/home/changelog/hotfixes-2-16.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/home/changelog/hotfixes-2-16.md b/docs/home/changelog/hotfixes-2-16.md index eb6b8d392d..a07194f9e3 100644 --- a/docs/home/changelog/hotfixes-2-16.md +++ b/docs/home/changelog/hotfixes-2-16.md @@ -8,6 +8,12 @@ date: 2024-Feb-26 The following is a list of the known and fixed issues for Run:ai V2.16. +## Version 2.16.57 + +| Internal ID | Description | +|--|--| +| RUN-20388 | Fixed an issue where cluster-sync caused a memory leak. | + ## Version 2.16.25 | Internal ID | Description | From 58f088d94915ecfa3b8f552f55751fd2b26ebca5 Mon Sep 17 00:00:00 2001 From: Yaron Date: Tue, 23 Jul 2024 14:48:27 +0300 Subject: [PATCH 03/10] policies-example --- docs/admin/workloads/policies/README.md | 93 ++++++++++++------------- 1 file changed, 45 insertions(+), 48 deletions(-) diff --git a/docs/admin/workloads/policies/README.md b/docs/admin/workloads/policies/README.md index f76540e3ad..bdb3a90afb 100644 --- a/docs/admin/workloads/policies/README.md +++ b/docs/admin/workloads/policies/README.md @@ -8,16 +8,16 @@ date: 2023-Dec-12 ## Introduction -*Policies* allow administrators to impose restrictions and set default values for researcher workloads. Restrictions and default values can be placed on CPUs, GPUs, and other resources or entities. Enabling the *New Policy Manager* provides information about resources that are non-compliant to applied policies. Resources that are non-compliant will appear greyed out. To see how a resource is not compliant, press on the clipboard icon in the upper right hand corner of the resource. +*Policies* allow administrators to impose restrictions and set default values for researcher workloads. Restrictions and default values can be placed on CPUs, GPUs, and other resources or entities. Enabling the *New Policy Manager* provides information about resources that are non-compliant to applied policies. Resources that are non-compliant will appear greyed out. To see how a resource is not compliant, press on the clipboard icon in the upper right-hand corner of the resource. !!! Note - Policies from Run:ai versions 2.15 or lower will still work after enabling the *New Policy Manager*. However, showing non-compliant policy rules will not be available. For more information about policies for version 2.15 or lower, see [What are Policies](policies.md#what-are-policies). + Policies from Run:ai versions 2.17 or lower will still work after enabling the New Policy Manager. For more information about policies for version 2.17 or lower, see [What are Policies](policies.md#what-are-policies). For example, an administrator can create and apply a policy that will restrict researchers from requesting more than 2 GPUs, or less than 1GB of memory per type of workload. Another example is an administrator who wants to set different amounts of CPU, GPUs and memory for different kinds of workloads. A training workload can have a default of 1 GB of memory, or an interactive workload can have a default amount of GPUs. -Policies are created for each Run:ai project (Kubernetes namespace). When a policy is created in the `runai` namespace, it will take effect when there is no project-specific policy for the workloads of the same kind. +Policies are created for each Run:ai project (Kubernetes namespace). When a policy is created in the `runai` namespace, it will take effect when there is no project-specific policy for workloads of the same kind. In interactive workloads or workspaces, applied policies will only allow researchers access to resources that are permitted in the policy. This can include compute resources as well as node pools and node pool priority. @@ -47,7 +47,7 @@ A policy configured to a specific scope, is applied to all elements in that scop ### Policy Editor UI -Policies are added to the system using the policy editor and are written in YAML format. YAML™ is a human-friendly, cross language, Unicode based data serialization language designed around the common native data types of dynamic programming languages. It is useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization. For more information, see [YAML.org](https://yaml.org/){target=_blank}. +Policies are added to the system using the policy editor and are written in YAML format. YAML™ is a human-friendly, cross-language, Unicode-based data serialization language designed around the common native data types of dynamic programming languages. It is useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization. For more information, see [YAML.org](https://yaml.org/){target=_blank}. ### Policy API @@ -59,50 +59,47 @@ The following is an example of a workspace policy you can apply in your platform ```YAML defaults: - environment: - allowPrivilegeEscalation: false - createHomeDir: true - environmentVariables: - - name: MY_ENV - value: my_value - workspace: - allowOverQuota: true + createHomeDir: true + environmentVariables: + instances: + - name: MY_ENV + value: my_value + security: + allowPrivilegeEscalation: false rules: - compute: - cpuCoreLimit: - min: 0 - max: 9 - required: true - gpuPortionRequest: - min: 0 - max: 10 + imagePullPolicy: + required: true + options: + - value: Always + displayed: Always + - value: Never + displayed: Never + createHomeDir: + canEdit: false + security: + runAsUid: + min: 1 + max: 32700 + allowPrivilegeEscalation: + canEdit: false + compute: + cpuCoreLimit: + required: true + min: 0 + max: 9 + gpuPortionRequest: + min: 0 + max: 10 + storage: + nfs: + instances: + canAdd: false s3: - url: - options: - - displayed: "https://www.google.com" - value: "https://www.google.com" - - displayed: "https://www.yahoo.com" - value: "https://www.yahoo.com" - environment: - imagePullPolicy: - options: - - displayed: "Always" - value: "Always" - - displayed: "Never" - value: "Never" - required: true - runAsUid: - min: 1 - max: 32700 - createHomeDir: - canEdit: false - allowPrivilegeEscalation: - canEdit: false - workspace: - allowOverQuota: - canEdit: false - imposedAssets: - dataSources: - nfs: - canAdd: false + attributes: + url: + options: + - value: https://www.google.com + displayed: https://www.google.com + - value: https://www.yahoo.com + displayed: https://www.yahoo.com ``` From 4d421432c688c6ba6021d00f51ed3ec2867e72fa Mon Sep 17 00:00:00 2001 From: Haim Levy <39706566+haimlevy2006@users.noreply.github.com> Date: Tue, 23 Jul 2024 21:15:14 +0300 Subject: [PATCH 04/10] Update automated-publish-docs.yaml --- .github/workflows/automated-publish-docs.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/automated-publish-docs.yaml b/.github/workflows/automated-publish-docs.yaml index 97c38ef47d..a1fac25875 100644 --- a/.github/workflows/automated-publish-docs.yaml +++ b/.github/workflows/automated-publish-docs.yaml @@ -97,4 +97,5 @@ jobs: SLACK_MESSAGE_ON_SUCCESS: "Docs were updated successfully for version ${{ needs.env.outputs.TITLE }}" SLACK_MESSAGE_ON_FAILURE: "Docs update FAILED for version ${{ needs.env.outputs.TITLE }}" MSG_MINIMAL: true - SLACK_FOOTER: "" \ No newline at end of file + SLACK_FOOTER: "" + From 81ca00ea8762bd5968bf33e3692dc469c303d773 Mon Sep 17 00:00:00 2001 From: Haim Levy <39706566+haimlevy2006@users.noreply.github.com> Date: Tue, 23 Jul 2024 21:49:43 +0300 Subject: [PATCH 05/10] Update automated-publish-docs.yaml --- .github/workflows/automated-publish-docs.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/automated-publish-docs.yaml b/.github/workflows/automated-publish-docs.yaml index a1fac25875..429aa60a2a 100644 --- a/.github/workflows/automated-publish-docs.yaml +++ b/.github/workflows/automated-publish-docs.yaml @@ -19,6 +19,8 @@ jobs: steps: - name: Checkout code uses: actions/checkout@v4 + with: + fetch-depth: 0 - name: Get all v*.* branches id: calculate-env @@ -48,7 +50,6 @@ jobs: uses: actions/checkout@v4 with: ref: ${{ needs.env.outputs.CURRENT_BRANCH }} - fetch-depth: 0 - name: setup python uses: actions/setup-python@v5 From 26a385423580b555ba4e13df70a90feb2777d161 Mon Sep 17 00:00:00 2001 From: Haim Levy <39706566+haimlevy2006@users.noreply.github.com> Date: Tue, 23 Jul 2024 22:10:00 +0300 Subject: [PATCH 06/10] Update automated-publish-docs.yaml --- .github/workflows/automated-publish-docs.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/automated-publish-docs.yaml b/.github/workflows/automated-publish-docs.yaml index 429aa60a2a..4823a8d9ac 100644 --- a/.github/workflows/automated-publish-docs.yaml +++ b/.github/workflows/automated-publish-docs.yaml @@ -25,7 +25,7 @@ jobs: - name: Get all v*.* branches id: calculate-env run: | - BRANCHES=$(git branch --list --all | grep -v master | grep 'origin/v*.*' | sed -n -E 's:.*/(v[0-9]+\.[0-9]+).*:\1:p' | sort -Vu) + BRANCHES=$(git branch -r | grep -E '^ *origin/v[0-9]{1,2}\.[0-9]{1,2}$' | sort -Vu | sed 's/origin\///g' | sed 's/ //g') NEWEST_VERSION=$(printf '%s\n' "${BRANCHES[@]}" | sort -V | tail -n 1) CURRENT_BRANCH=${GITHUB_REF#refs/heads/} ALIAS=$CURRENT_BRANCH-alias From a2fa4e0b3cef2625eeac9c82da381a4d10bee1b3 Mon Sep 17 00:00:00 2001 From: Haim Levy <39706566+haimlevy2006@users.noreply.github.com> Date: Tue, 23 Jul 2024 22:12:08 +0300 Subject: [PATCH 07/10] Update automated-publish-docs.yaml From 3573c58db48d3b44ccbce06cc4cbc56da508b087 Mon Sep 17 00:00:00 2001 From: JamieWeider72 <147967555+JamieWeider72@users.noreply.github.com> Date: Thu, 25 Jul 2024 14:59:14 +0300 Subject: [PATCH 08/10] Update whats-new-2-18.md --- docs/home/whats-new-2-18.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/home/whats-new-2-18.md b/docs/home/whats-new-2-18.md index b8da997a35..6ab6126302 100644 --- a/docs/home/whats-new-2-18.md +++ b/docs/home/whats-new-2-18.md @@ -22,19 +22,19 @@ date: 2024-June-14 * Added new *Data sources* of type *Secret* to workload form. *Data sources* of type *Secret* are used to hide 3rd party access credentials when submitting workloads. For more information, see [Submitting Workloads](../admin/workloads/submitting-workloads.md#how-to-submit-a-workload). -* Added new graphs for *Inference* workloads. The new graphs provide more information for *Inference* workloads to help analyze performance of the workloads. New graphs include Latency, Throughput, and number of replicas. For more information, see [Workloads View](../admin/workloads/README.md#workloads-view). +* Added new graphs for *Inference* workloads. The new graphs provide more information for *Inference* workloads to help analyze performance of the workloads. New graphs include Latency, Throughput, and number of replicas. For more information, see [Workloads View](../admin/workloads/README.md#workloads-view). (Requires minimum cluster version v2.18). -* Added latency metric for autoscaling. This feature allows automatic scale-up/down the number of replicas of a Run:ai inference workload based on the threshold set by the ML Engineer. This ensures that response time is kept under the target SLA. +* Added latency metric for autoscaling. This feature allows automatic scale-up/down the number of replicas of a Run:ai inference workload based on the threshold set by the ML Engineer. This ensures that response time is kept under the target SLA. (Requires minimum cluster version v2.18). * Improved autoscaling for inference models by taking out ChatBot UI from models images. By moving ChatBot UI to predefined *Environments*, autoscaling is more accurate by taking into account all types of requests (API, and ChatBot UI). Adding a ChatBot UI environment preset by Run:ai allows AI practitioners to easily connect them to workloads. -* Added more precision to trigger auto-scaling to zero. Now users can configure a precise consecutive idle threshold custom setting to trigger Run:ai inference workloads to scale-to-zero. +* Added more precision to trigger auto-scaling to zero. Now users can configure a precise consecutive idle threshold custom setting to trigger Run:ai inference workloads to scale-to-zero. (Requires minimum cluster version v2.18). * Added Hugging Face catalog integration of community models. Run:ai has added Hugging Face integration directly to the inference workload form, providing the ability to select models (vLLM models) from Hugging Face. This allows organizations to quickly experiment with the latest open source community language models. For more information on how Hugging Face is integrated, see [Hugging Face](../admin/workloads/submitting-workloads.md). -* Improved access permissions to external tools. This improvement now allows more granular control over which personas can access external tools (external URLs) such as Jupyter Notebooks, Chatbot UI, and others. For configuration information, see [Submitting workloads](../admin/workloads/submitting-workloads.md). +* Improved access permissions to external tools. This improvement now allows more granular control over which personas can access external tools (external URLs) such as Jupyter Notebooks, Chatbot UI, and others. For configuration information, see [Submitting workloads](../admin/workloads/submitting-workloads.md). (Requires minimum cluster version v2.18). -* Added a new API for submitting Run:ai inference workloads. This API allows users to easily submit inference workloads. This new API provides a consistent user experience for workload submission which maintains data integrity across all the user interfaces in the Run:ai platform. +* Added a new API for submitting Run:ai inference workloads. This API allows users to easily submit inference workloads. This new API provides a consistent user experience for workload submission which maintains data integrity across all the user interfaces in the Run:ai platform. (Requires minimum cluster version v2.18). #### Command Line Interface @@ -47,11 +47,11 @@ date: 2024-June-14 * Improved usability and performance This is an early access feature available for customers to use; however be aware that there may be functional gaps versus the legacy CLI. - For more information about installing and using the Improved CLI, see [Improved CLI](../Researcher/cli-reference/new-cli/runai.md). + For more information about installing and using the Improved CLI, see [Improved CLI](../Researcher/cli-reference/new-cli/runai.md). (Requires minimum cluster version v2.18). #### GPU memory swap -* Added new GPU to CPU memory swap. To ensure efficient usage of an organization’s resources, Run:ai provides multiple features on multiple layers to help administrators and practitioners maximize their existing GPUs resource utilization. Run:ai’s GPU memory swap feature helps administrators and AI practitioners to further increase the utilization of existing GPU HW by improving GPU sharing between AI initiatives and stakeholders. This is done by expending the GPU physical memory to the CPU memory which is typically an order of magnitude larger than that of the GPU. For more information see, [GPU Memory Swap](../Researcher/scheduling/gpu-memory-swap.md). +* Added new GPU to CPU memory swap. To ensure efficient usage of an organization’s resources, Run:ai provides multiple features on multiple layers to help administrators and practitioners maximize their existing GPUs resource utilization. Run:ai’s GPU memory swap feature helps administrators and AI practitioners to further increase the utilization of existing GPU HW by improving GPU sharing between AI initiatives and stakeholders. This is done by expending the GPU physical memory to the CPU memory which is typically an order of magnitude larger than that of the GPU. For more information see, [GPU Memory Swap](../Researcher/scheduling/gpu-memory-swap.md). (Requires minimum cluster version v2.18). #### YAML Workload Reference table @@ -75,13 +75,13 @@ date: 2024-June-14 * Shared between multiple scopes—unlike other Run:ai data sources, data volumes can be shared across projects, departments, or clusters. This promotes data reuse and collaboration within your organization. * Coupled to workloads in the submission process—similar to other Run:ai data sources, Data volumes can be easily attached to AI workloads during submission, specifying the data path within the workload environment. - For more information, see [Data Volumes](../developer/admin-rest-api/data-volumes.md). + For more information, see [Data Volumes](../developer/admin-rest-api/data-volumes.md). (Requires minimum cluster version v2.18). * Added new data source of type *Secret*. Run:ai now allows you to configure a *Credential* as a data source. A *Data source* of type *Secret* is best used in workloads so that access to 3rd party interfaces and storage used in containers, keep access credentials hidden. For more information, see [Secrets as a data source](../Researcher/user-interface/workspaces/create/create-ds.md/#create-a-secret-as-data-source). #### Credentials -* Added new *Generic secret* to *Credentials*. *Credentials* had been used only for access to data sources (S3, Git, etc.). However, AI practitioners need to use secrets to access sensitive data (interacting with 3rd party APIs, or other services) without having to put their credentials in their source code. *Generic secrets* leverage multiple key value pairs which helps reduce the number of Kubernetes resources and simplifies resource management by reducing the overhead associated with maintaining multiple Secrets. *Generic secrets* are best used as a data source of type *Secret* so that they can be used in containers to keep access credentials hidden. +* Added new *Generic secret* to *Credentials*. *Credentials* had been used only for access to data sources (S3, Git, etc.). However, AI practitioners need to use secrets to access sensitive data (interacting with 3rd party APIs, or other services) without having to put their credentials in their source code. *Generic secrets* leverage multiple key value pairs which helps reduce the number of Kubernetes resources and simplifies resource management by reducing the overhead associated with maintaining multiple Secrets. *Generic secrets* are best used as a data source of type *Secret* so that they can be used in containers to keep access credentials hidden. (Requires minimum cluster version v2.18). #### Single Sign On From 6f9dcd9defb6c2d7016f23e0a3230094126913ab Mon Sep 17 00:00:00 2001 From: JamieWeider72 <147967555+JamieWeider72@users.noreply.github.com> Date: Tue, 30 Jul 2024 13:39:00 +0300 Subject: [PATCH 09/10] RUN-19295 added policy for workloads in api --- docs/home/whats-new-2-18.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/whats-new-2-18.md b/docs/home/whats-new-2-18.md index 6a4c840d3c..26dc05a307 100644 --- a/docs/home/whats-new-2-18.md +++ b/docs/home/whats-new-2-18.md @@ -101,7 +101,7 @@ date: 2024-June-14 #### Policy for distributed and inference workloads in the API -Added a new API for creating distributed training workload policies and inference workload policies. These new policies in the API allow to set defaults, enforce rules and impose setup on distributed training and inference workloads. For distributed policies, worker and master may require different rules due to their different specifications. The new capability is currently available via API only. Documentation on submitting policies to follow shortly. +* Added a new API for creating distributed training workload policies and inference workload policies. These new policies in the API allow to set defaults, enforce rules and impose setup on distributed training and inference workloads. For distributed policies, worker and master may require different rules due to their different specifications. The new capability is currently available via API only. Documentation on submitting policies to follow shortly. ## Deprecation Notifications From 77aaaedb9f3e525fbdde3aded0cbc5564ce6f971 Mon Sep 17 00:00:00 2001 From: JamieWeider72 <147967555+JamieWeider72@users.noreply.github.com> Date: Tue, 30 Jul 2024 13:43:55 +0300 Subject: [PATCH 10/10] RUN-19295-additional changes --- docs/home/whats-new-2-18.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/whats-new-2-18.md b/docs/home/whats-new-2-18.md index 427ed90289..017808cf18 100644 --- a/docs/home/whats-new-2-18.md +++ b/docs/home/whats-new-2-18.md @@ -85,7 +85,7 @@ date: 2024-June-14 #### Single Sign On -* Added support for Single Sign On using OpenShift v4 (OIDC based). When using OpenShift, you must first define OAuthClient which interacts with OpenShift's OAuth server to authenticate users and request access tokens. For more information, see [Single Sign-On](../admin/runai-setup/authentication/sso/). +* Added support for Single Sign On using OpenShift v4 (OIDC based). When using OpenShift, you must first define OAuthClient which interacts with OpenShift's OAuth server to authenticate users and request access tokens. For more information, see [Single Sign-On](../admin/runai-setup/authentication/sso/). * Added OIDC scopes to authentication requests. OIDC Scopes are used to specify what access privileges are being requested for access tokens. The scopes associated with the access tokens determine what resource are available when they are used to access OAuth 2.0 protected endpoints. Protected endpoints may perform different actions and return different information based on the scope values and other parameters used when requesting the presented access token. For more information, see [UI configuration](../admin/runai-setup/authentication/sso/#step-1-ui-configuration).