Skip to content

Run 19295 addition #885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .github/workflows/automated-publish-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Get all v*.* branches
id: calculate-env
run: |
BRANCHES=$(git branch --list --all | grep -v master | grep 'origin/v*.*' | sed -n -E 's:.*/(v[0-9]+\.[0-9]+).*:\1:p' | sort -Vu)
BRANCHES=$(git branch -r | grep -E '^ *origin/v[0-9]{1,2}\.[0-9]{1,2}$' | sort -Vu | sed 's/origin\///g' | sed 's/ //g')
NEWEST_VERSION=$(printf '%s\n' "${BRANCHES[@]}" | sort -V | tail -n 1)
CURRENT_BRANCH=${GITHUB_REF#refs/heads/}
ALIAS=$CURRENT_BRANCH-alias
Expand All @@ -48,7 +50,6 @@ jobs:
uses: actions/checkout@v4
with:
ref: ${{ needs.env.outputs.CURRENT_BRANCH }}
fetch-depth: 0

- name: setup python
uses: actions/setup-python@v5
Expand Down Expand Up @@ -97,4 +98,5 @@ jobs:
SLACK_MESSAGE_ON_SUCCESS: "Docs were updated successfully for version ${{ needs.env.outputs.TITLE }}"
SLACK_MESSAGE_ON_FAILURE: "Docs update FAILED for version ${{ needs.env.outputs.TITLE }}"
MSG_MINIMAL: true
SLACK_FOOTER: ""
SLACK_FOOTER: ""

93 changes: 45 additions & 48 deletions docs/admin/workloads/policies/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ date: 2023-Dec-12

## Introduction

*Policies* allow administrators to impose restrictions and set default values for researcher workloads. Restrictions and default values can be placed on CPUs, GPUs, and other resources or entities. Enabling the *New Policy Manager* provides information about resources that are non-compliant to applied policies. Resources that are non-compliant will appear greyed out. To see how a resource is not compliant, press on the clipboard icon in the upper right hand corner of the resource.
*Policies* allow administrators to impose restrictions and set default values for researcher workloads. Restrictions and default values can be placed on CPUs, GPUs, and other resources or entities. Enabling the *New Policy Manager* provides information about resources that are non-compliant to applied policies. Resources that are non-compliant will appear greyed out. To see how a resource is not compliant, press on the clipboard icon in the upper right-hand corner of the resource.

!!! Note
Policies from Run:ai versions 2.15 or lower will still work after enabling the *New Policy Manager*. However, showing non-compliant policy rules will not be available. For more information about policies for version 2.15 or lower, see [What are Policies](policies.md#what-are-policies).
Policies from Run:ai versions 2.17 or lower will still work after enabling the New Policy Manager. For more information about policies for version 2.17 or lower, see [What are Policies](policies.md#what-are-policies).

For example, an administrator can create and apply a policy that will restrict researchers from requesting more than 2 GPUs, or less than 1GB of memory per type of workload.

Another example is an administrator who wants to set different amounts of CPU, GPUs and memory for different kinds of workloads. A training workload can have a default of 1 GB of memory, or an interactive workload can have a default amount of GPUs.

Policies are created for each Run:ai project (Kubernetes namespace). When a policy is created in the `runai` namespace, it will take effect when there is no project-specific policy for the workloads of the same kind.
Policies are created for each Run:ai project (Kubernetes namespace). When a policy is created in the `runai` namespace, it will take effect when there is no project-specific policy for workloads of the same kind.

In interactive workloads or workspaces, applied policies will only allow researchers access to resources that are permitted in the policy. This can include compute resources as well as node pools and node pool priority.

Expand Down Expand Up @@ -47,7 +47,7 @@ A policy configured to a specific scope, is applied to all elements in that scop

### Policy Editor UI

Policies are added to the system using the policy editor and are written in YAML format. YAML™ is a human-friendly, cross language, Unicode based data serialization language designed around the common native data types of dynamic programming languages. It is useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization. For more information, see [YAML.org](https://yaml.org/){target=_blank}.
Policies are added to the system using the policy editor and are written in YAML format. YAML™ is a human-friendly, cross-language, Unicode-based data serialization language designed around the common native data types of dynamic programming languages. It is useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization. For more information, see [YAML.org](https://yaml.org/){target=_blank}.

### Policy API

Expand All @@ -59,50 +59,47 @@ The following is an example of a workspace policy you can apply in your platform

```YAML
defaults:
environment:
allowPrivilegeEscalation: false
createHomeDir: true
environmentVariables:
- name: MY_ENV
value: my_value
workspace:
allowOverQuota: true
createHomeDir: true
environmentVariables:
instances:
- name: MY_ENV
value: my_value
security:
allowPrivilegeEscalation: false
rules:
compute:
cpuCoreLimit:
min: 0
max: 9
required: true
gpuPortionRequest:
min: 0
max: 10
imagePullPolicy:
required: true
options:
- value: Always
displayed: Always
- value: Never
displayed: Never
createHomeDir:
canEdit: false
security:
runAsUid:
min: 1
max: 32700
allowPrivilegeEscalation:
canEdit: false
compute:
cpuCoreLimit:
required: true
min: 0
max: 9
gpuPortionRequest:
min: 0
max: 10
storage:
nfs:
instances:
canAdd: false
s3:
url:
options:
- displayed: "https://www.google.com"
value: "https://www.google.com"
- displayed: "https://www.yahoo.com"
value: "https://www.yahoo.com"
environment:
imagePullPolicy:
options:
- displayed: "Always"
value: "Always"
- displayed: "Never"
value: "Never"
required: true
runAsUid:
min: 1
max: 32700
createHomeDir:
canEdit: false
allowPrivilegeEscalation:
canEdit: false
workspace:
allowOverQuota:
canEdit: false
imposedAssets:
dataSources:
nfs:
canAdd: false
attributes:
url:
options:
- value: https://www.google.com
displayed: https://www.google.com
- value: https://www.yahoo.com
displayed: https://www.yahoo.com
```
8 changes: 4 additions & 4 deletions docs/home/whats-new-2-18.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ date: 2024-June-14

* <!-- RUN-16917/RUN-19363 move to top Expose secrets in workload submission -->Added new *Data sources* of type *Secret* to workload form. *Data sources* of type *Secret* are used to hide 3rd party access credentials when submitting workloads. For more information, see [Submitting Workloads](../admin/workloads/submitting-workloads.md#how-to-submit-a-workload).

* <!-- RUN-16830/RUN-16831 - Graphs & special metrics for inference -->Added new graphs for *Inference* workloads. The new graphs provide more information for *Inference* workloads to help analyze performance of the workloads. New graphs include Latency, Throughput, and number of replicas. For more information, see [Workloads View](../admin/workloads/README.md#workloads-view) (Requires minimum cluster version v2.18).
* <!-- RUN-16830/RUN-16831 - Graphs & special metrics for inference -->Added new graphs for *Inference* workloads. The new graphs provide more information for *Inference* workloads to help analyze performance of the workloads. New graphs include Latency, Throughput, and number of replicas. For more information, see [Workloads View](../admin/workloads/README.md#workloads-view). (Requires minimum cluster version v2.18).

* <!-- TODO add link to doc when ready - get approval for text RUN-16805/RUN-17416 - Provide latency-based metric for autoscaling for requests -->Added latency metric for autoscaling. This feature allows automatic scale-up/down the number of replicas of a Run:ai inference workload based on the threshold set by the ML Engineer. This ensures that response time is kept under the target SLA. (Requires minimum cluster version v2.18).

* <!-- TODO Add to inference doc models explanation after autoscaling. RUN-16872/RUN-18526 Separating ChatUi from model in favor of coherent autoscaling -->Improved autoscaling for inference models by taking out ChatBot UI from models images. By moving ChatBot UI to predefined *Environments*, autoscaling is more accurate by taking into account all types of requests (API, and ChatBot UI). Adding a ChatBot UI environment preset by Run:ai allows AI practitioners to easily connect them to workloads.
* <!-- Add to inference doc models explanation after autoscaling. RUN-16872/RUN-18526 Separating ChatUi from model in favor of coherent autoscaling -->Improved autoscaling for inference models by taking out ChatBot UI from models images. By moving ChatBot UI to predefined *Environments*, autoscaling is more accurate by taking into account all types of requests (API, and ChatBot UI). Adding a ChatBot UI environment preset by Run:ai allows AI practitioners to easily connect them to workloads.

* <!-- RUN-16832/ RUN-16833 - Custom value for auto-scale to zero-->Added more precision to trigger auto-scaling to zero. Now users can configure a precise consecutive idle threshold custom setting to trigger Run:ai inference workloads to scale-to-zero. (Requires minimum cluster version v2.18).

Expand Down Expand Up @@ -85,7 +85,7 @@ date: 2024-June-14

#### Single Sign On

* <!-- TODO Change ticket numbers and description RUN-16859/RUN-16860-->Added support for Single Sign On using OpenShift v4 (OIDC based). When using OpenShift, you must first define OAuthClient which interacts with OpenShift's OAuth server to authenticate users and request access tokens. For more information, see [Single Sign-On](../admin/runai-setup/authentication/sso/).
* Added support for Single Sign On using OpenShift v4 (OIDC based). When using OpenShift, you must first define OAuthClient which interacts with OpenShift's OAuth server to authenticate users and request access tokens. For more information, see [Single Sign-On](../admin/runai-setup/authentication/sso/).

* <!-- RUN-16788/RUN-16866 - OIDC Scopes -->Added OIDC scopes to authentication requests. OIDC Scopes are used to specify what access privileges are being requested for access tokens. The scopes associated with the access tokens determine what resource are available when they are used to access OAuth 2.0 protected endpoints. Protected endpoints may perform different actions and return different information based on the scope values and other parameters used when requesting the presented access token. For more information, see [UI configuration](../admin/runai-setup/authentication/sso/#step-1-ui-configuration).

Expand All @@ -101,7 +101,7 @@ date: 2024-June-14

#### Policy for distributed and inference workloads in the API

Added a new API for creating distributed training workload policies and inference workload policies. These new policies in the API allow to set defaults, enforce rules and impose setup on distributed training and inference workloads. For distributed policies, worker and master may require different rules due to their different specifications. The new capability is currently available via API only. Documentation on submitting policies to follow shortly.
* Added a new API for creating distributed training workload policies and inference workload policies. These new policies in the API allow to set defaults, enforce rules and impose setup on distributed training and inference workloads. For distributed policies, worker and master may require different rules due to their different specifications. The new capability is currently available via API only. Documentation on submitting policies to follow shortly.

## Deprecation Notifications

Expand Down
Loading