Skip to content

Commit ab0328e

Browse files
Merge pull request #897 from run-ai/v2.18
Existing PVC
2 parents ba3b875 + e16e216 commit ab0328e

File tree

7 files changed

+145
-58
lines changed

7 files changed

+145
-58
lines changed

.github/workflows/automated-publish-docs.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
- name: Get all v*.* branches
2626
id: calculate-env
2727
run: |
28-
BRANCHES=$(git branch --list --all | grep -v master | grep 'origin/v*.*' | sed -n -E 's:.*/(v[0-9]+\.[0-9]+).*:\1:p' | sort -Vu)
28+
BRANCHES=$(git branch -r | grep -E '^ *origin/v[0-9]{1,2}\.[0-9]{1,2}$' | sort -Vu | sed 's/origin\///g' | sed 's/ //g')
2929
NEWEST_VERSION=$(printf '%s\n' "${BRANCHES[@]}" | sort -V | tail -n 1)
3030
CURRENT_BRANCH=${GITHUB_REF#refs/heads/}
3131
ALIAS=$CURRENT_BRANCH-alias
@@ -50,7 +50,6 @@ jobs:
5050
uses: actions/checkout@v4
5151
with:
5252
ref: ${{ needs.env.outputs.CURRENT_BRANCH }}
53-
fetch-depth: 0
5453

5554
- name: setup python
5655
uses: actions/setup-python@v5
@@ -99,4 +98,5 @@ jobs:
9998
SLACK_MESSAGE_ON_SUCCESS: "Docs were updated successfully for version ${{ needs.env.outputs.TITLE }}. PR Link: ${{ github.event.pull_request.html_url }}"
10099
SLACK_MESSAGE_ON_FAILURE: "Docs update FAILED for version ${{ needs.env.outputs.TITLE }}. PR Link: ${{ github.event.pull_request.html_url }}"
101100
MSG_MINIMAL: true
102-
SLACK_FOOTER: ""
101+
SLACK_FOOTER: ""
102+

docs/Researcher/user-interface/workspaces/blocks/Existing PVC.md

Lines changed: 90 additions & 0 deletions
Large diffs are not rendered by default.

docs/admin/runai-setup/self-hosted/k8s/backend.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Run the helm command below:
1313
``` bash
1414
helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
1515
helm repo update
16-
helm upgrade -i runai-backend -n runai-backend runai-backend/control-plane --version "~2.17.0" \
16+
helm upgrade -i runai-backend -n runai-backend runai-backend/control-plane --version "~2.18.0" \
1717
--set global.domain=<DOMAIN> # (1)
1818
```
1919

docs/admin/runai-setup/self-hosted/ocp/backend.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Run the helm command below:
1212
``` bash
1313
helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
1414
helm repo update
15-
helm upgrade -i runai-backend -n runai-backend runai-backend/control-plane --version "~2.17.0" \
15+
helm upgrade -i runai-backend -n runai-backend runai-backend/control-plane --version "~2.18.0" \
1616
--set global.domain=runai.apps.<OPENSHIFT-CLUSTER-DOMAIN> \ # (1)
1717
--set global.config.kubernetesDistribution=openshift
1818
```

docs/admin/workloads/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,9 @@ To get the full experience of Run:ai’s environment and platform use the follow
124124
* [Distributed training](../../Researcher/user-interface/trainings.md#trainings)
125125
* Deployments.
126126

127-
## Supported integrations
127+
## Workload-related Integrations
128128

129-
To assist you with other platforms, and other types of workloads use the integrations listed below.
129+
To assist you with other platforms, and other types of workloads use the integrations listed below. These integrations are not regularly tested by Run:ai and are hence provided on an as-is basis. The link below point to the Run:ai customer portal.
130130

131131
1. [Airflow](https://runai.my.site.com/community/s/article/How-to-integrate-Run-ai-with-Apache-Airflow){target=_blank}
132132
2. [MLflow](https://runai.my.site.com/community/s/article/How-to-integrate-Run-ai-with-MLflow){target=_blank}

docs/admin/workloads/policies/README.md

Lines changed: 45 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,16 @@ date: 2023-Dec-12
88

99
## Introduction
1010

11-
*Policies* allow administrators to impose restrictions and set default values for researcher workloads. Restrictions and default values can be placed on CPUs, GPUs, and other resources or entities. Enabling the *New Policy Manager* provides information about resources that are non-compliant to applied policies. Resources that are non-compliant will appear greyed out. To see how a resource is not compliant, press on the clipboard icon in the upper right hand corner of the resource.
11+
*Policies* allow administrators to impose restrictions and set default values for researcher workloads. Restrictions and default values can be placed on CPUs, GPUs, and other resources or entities. Enabling the *New Policy Manager* provides information about resources that are non-compliant to applied policies. Resources that are non-compliant will appear greyed out. To see how a resource is not compliant, press on the clipboard icon in the upper right-hand corner of the resource.
1212

1313
!!! Note
14-
Policies from Run:ai versions 2.15 or lower will still work after enabling the *New Policy Manager*. However, showing non-compliant policy rules will not be available. For more information about policies for version 2.15 or lower, see [What are Policies](policies.md#what-are-policies).
14+
Policies from Run:ai versions 2.17 or lower will still work after enabling the New Policy Manager. For more information about policies for version 2.17 or lower, see [What are Policies](policies.md#what-are-policies).
1515

1616
For example, an administrator can create and apply a policy that will restrict researchers from requesting more than 2 GPUs, or less than 1GB of memory per type of workload.
1717

1818
Another example is an administrator who wants to set different amounts of CPU, GPUs and memory for different kinds of workloads. A training workload can have a default of 1 GB of memory, or an interactive workload can have a default amount of GPUs.
1919

20-
Policies are created for each Run:ai project (Kubernetes namespace). When a policy is created in the `runai` namespace, it will take effect when there is no project-specific policy for the workloads of the same kind.
20+
Policies are created for each Run:ai project (Kubernetes namespace). When a policy is created in the `runai` namespace, it will take effect when there is no project-specific policy for workloads of the same kind.
2121

2222
In interactive workloads or workspaces, applied policies will only allow researchers access to resources that are permitted in the policy. This can include compute resources as well as node pools and node pool priority.
2323

@@ -47,7 +47,7 @@ A policy configured to a specific scope, is applied to all elements in that scop
4747

4848
### Policy Editor UI
4949

50-
Policies are added to the system using the policy editor and are written in YAML format. YAML™ is a human-friendly, cross language, Unicode based data serialization language designed around the common native data types of dynamic programming languages. It is useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization. For more information, see [YAML.org](https://yaml.org/){target=_blank}.
50+
Policies are added to the system using the policy editor and are written in YAML format. YAML™ is a human-friendly, cross-language, Unicode-based data serialization language designed around the common native data types of dynamic programming languages. It is useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization. For more information, see [YAML.org](https://yaml.org/){target=_blank}.
5151

5252
### Policy API
5353

@@ -59,50 +59,47 @@ The following is an example of a workspace policy you can apply in your platform
5959

6060
```YAML
6161
defaults:
62-
environment:
63-
allowPrivilegeEscalation: false
64-
createHomeDir: true
65-
environmentVariables:
66-
- name: MY_ENV
67-
value: my_value
68-
workspace:
69-
allowOverQuota: true
62+
createHomeDir: true
63+
environmentVariables:
64+
instances:
65+
- name: MY_ENV
66+
value: my_value
67+
security:
68+
allowPrivilegeEscalation: false
7069
rules:
71-
compute:
72-
cpuCoreLimit:
73-
min: 0
74-
max: 9
75-
required: true
76-
gpuPortionRequest:
77-
min: 0
78-
max: 10
70+
imagePullPolicy:
71+
required: true
72+
options:
73+
- value: Always
74+
displayed: Always
75+
- value: Never
76+
displayed: Never
77+
createHomeDir:
78+
canEdit: false
79+
security:
80+
runAsUid:
81+
min: 1
82+
max: 32700
83+
allowPrivilegeEscalation:
84+
canEdit: false
85+
compute:
86+
cpuCoreLimit:
87+
required: true
88+
min: 0
89+
max: 9
90+
gpuPortionRequest:
91+
min: 0
92+
max: 10
93+
storage:
94+
nfs:
95+
instances:
96+
canAdd: false
7997
s3:
80-
url:
81-
options:
82-
- displayed: "https://www.google.com"
83-
value: "https://www.google.com"
84-
- displayed: "https://www.yahoo.com"
85-
value: "https://www.yahoo.com"
86-
environment:
87-
imagePullPolicy:
88-
options:
89-
- displayed: "Always"
90-
value: "Always"
91-
- displayed: "Never"
92-
value: "Never"
93-
required: true
94-
runAsUid:
95-
min: 1
96-
max: 32700
97-
createHomeDir:
98-
canEdit: false
99-
allowPrivilegeEscalation:
100-
canEdit: false
101-
workspace:
102-
allowOverQuota:
103-
canEdit: false
104-
imposedAssets:
105-
dataSources:
106-
nfs:
107-
canAdd: false
98+
attributes:
99+
url:
100+
options:
101+
- value: https://www.google.com
102+
displayed: https://www.google.com
103+
- value: https://www.yahoo.com
104+
displayed: https://www.yahoo.com
108105
```

docs/home/whats-new-2-18.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ date: 2024-June-14
2222

2323
* <!-- RUN-16917/RUN-19363 move to top Expose secrets in workload submission -->Added new *Data sources* of type *Secret* to workload form. *Data sources* of type *Secret* are used to hide 3rd party access credentials when submitting workloads. For more information, see [Submitting Workloads](../admin/workloads/submitting-workloads.md#how-to-submit-a-workload).
2424

25-
* <!-- RUN-16830/RUN-16831 - Graphs & special metrics for inference -->Added new graphs for *Inference* workloads. The new graphs provide more information for *Inference* workloads to help analyze performance of the workloads. New graphs include Latency, Throughput, and number of replicas. For more information, see [Workloads View](../admin/workloads/README.md#workloads-view) (Requires minimum cluster version v2.18).
25+
* <!-- RUN-16830/RUN-16831 - Graphs & special metrics for inference -->Added new graphs for *Inference* workloads. The new graphs provide more information for *Inference* workloads to help analyze performance of the workloads. New graphs include Latency, Throughput, and number of replicas. For more information, see [Workloads View](../admin/workloads/README.md#workloads-view). (Requires minimum cluster version v2.18).
2626

2727
* <!-- TODO add link to doc when ready - get approval for text RUN-16805/RUN-17416 - Provide latency-based metric for autoscaling for requests -->Added latency metric for autoscaling. This feature allows automatic scale-up/down the number of replicas of a Run:ai inference workload based on the threshold set by the ML Engineer. This ensures that response time is kept under the target SLA. (Requires minimum cluster version v2.18).
2828

29-
* <!-- TODO Add to inference doc models explanation after autoscaling. RUN-16872/RUN-18526 Separating ChatUi from model in favor of coherent autoscaling -->Improved autoscaling for inference models by taking out ChatBot UI from models images. By moving ChatBot UI to predefined *Environments*, autoscaling is more accurate by taking into account all types of requests (API, and ChatBot UI). Adding a ChatBot UI environment preset by Run:ai allows AI practitioners to easily connect them to workloads.
29+
* <!-- Add to inference doc models explanation after autoscaling. RUN-16872/RUN-18526 Separating ChatUi from model in favor of coherent autoscaling -->Improved autoscaling for inference models by taking out ChatBot UI from models images. By moving ChatBot UI to predefined *Environments*, autoscaling is more accurate by taking into account all types of requests (API, and ChatBot UI). Adding a ChatBot UI environment preset by Run:ai allows AI practitioners to easily connect them to workloads.
3030

3131
* <!-- RUN-16832/ RUN-16833 - Custom value for auto-scale to zero-->Added more precision to trigger auto-scaling to zero. Now users can configure a precise consecutive idle threshold custom setting to trigger Run:ai inference workloads to scale-to-zero. (Requires minimum cluster version v2.18).
3232

@@ -101,7 +101,7 @@ date: 2024-June-14
101101

102102
#### Policy for distributed and inference workloads in the API
103103

104-
Added a new API for creating distributed training workload policies and inference workload policies. These new policies in the API allow to set defaults, enforce rules and impose setup on distributed training and inference workloads. For distributed policies, worker and master may require different rules due to their different specifications. The new capability is currently available via API only. Documentation on submitting policies to follow shortly.
104+
* Added a new API for creating distributed training workload policies and inference workload policies. These new policies in the API allow to set defaults, enforce rules and impose setup on distributed training and inference workloads. For distributed policies, worker and master may require different rules due to their different specifications. The new capability is currently available via API only. Documentation on submitting policies to follow shortly.
105105

106106
#### Policy for distributed and inference workloads in the API
107107

0 commit comments

Comments
 (0)