Skip to content

Commit 9d50e02

Browse files
authored
Merge branch 'run-ai:master' into RUN-19606-cluster-config
2 parents 4fd7993 + d6a7005 commit 9d50e02

File tree

7 files changed

+208
-9
lines changed

7 files changed

+208
-9
lines changed
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: Researcher Email Notifications
3+
summary: This article describes researcher notifications and how to configure them.
4+
authors:
5+
- Jason Novich
6+
- Shiri Arad
7+
date: 2024-Jul-4
8+
---
9+
10+
## Importance of Email Notifications for Data Scientists
11+
12+
Managing numerous data science workloads requires monitoring various stages, including submission, scheduling, initialization, execution, and completion. Additionally, handling suspensions and failures is crucial for ensuring timely workload completion. Email Notifications address this need by sending alerts for critical workload life cycle changes. This empowers data scientists to take necessary actions and prevent delays.
13+
14+
Once the system administrator configures the email notifications, users will receive notifications about their jobs that transition from one status to another. In addition, the user will get warning notifications before workload termination due to project-defined timeouts. Details included in the email are:
15+
16+
* Workload type
17+
* Project and cluster information
18+
* Event timestamp
19+
20+
To configure the types of email notifications you can receive:
21+
22+
1. The user must log in to their account.
23+
2. Press the user icon, then select settings.
24+
3. In the *Email notifications*, and in the *Send me an email about my workloads when* section, select the relevant workload statuses.
25+
4. When complete, press *Save*.

docs/admin/overview-administrator.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Overview: Administrator Documentation
22

3-
The role of Administrators is to set up Run:ai and perform day-to-day monitoring and maintenance.
3+
The role of Administrators is to set up Run:ai and perform day-to-day monitoring and maintenance.
44

55
As part of the Administrator documentation you will find:
66

@@ -9,4 +9,3 @@ As part of the Administrator documentation you will find:
99
* How to configure __Workloads__ and Workload __Policies__.
1010
* Setting and maintaining the cluster via the __Run:ai User Interface__.
1111
* __Troubleshooting__ Run:ai and understanding cluster health.
12-
* __Integrations__ of Run:ai with a variety of other systems.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
title: Notifications
3+
summary: This article describes the notifications that are available to the Run:ai platform, and how to configure them.
4+
authors:
5+
- Jason Novich
6+
- Shiri Arad
7+
date: 2024-Jul-4
8+
---
9+
10+
## Email Notifications for Data Scientists
11+
12+
Managing numerous data science workloads requires monitoring various stages, including submission, scheduling, initialization, execution, and completion. Additionally, handling suspensions and failures is crucial for ensuring timely workload completion. Email Notifications address this need by sending alerts for critical workload life cycle changes. This empowers data scientists to take necessary actions and prevent delays.
13+
14+
### Setting Up Email Notifications
15+
16+
!!! Important
17+
The system administrator needs to enable and setup email notifications so that users are kept informed about different system statuses.
18+
19+
To enable email notifications for the system:
20+
21+
1. Press *Tools & Settings*, then select *Notifications*.
22+
23+
!!! Note
24+
For SaaS deployments, use the *Enable email notifications* toggle.
25+
26+
2. In the *SMTP Host* field, enter the SMTP server address and in the *SMTP port* field the port number.
27+
3. Select an *Authentication type*:
28+
29+
1. **Plain**—enter a username and password to be used for authentication.
30+
2. **Login**—enter a username and password to be used for authentication.
31+
32+
4. Enter the *From email address* and the *Display name*.
33+
5. Press *Verify* to ensure that the email configuration is working.
34+
6. Press *Save* when complete.
35+
36+
## System Notifications
37+
38+
Administrators can set system wide notifications for all the users in order to alert them of important information. System notifications allows administrators the ability to update users with events that may be occurring within the Run:ai platform. The system notification will appear at each login or after the message has changed for users who are already logged in.
39+
40+
To configure system notifications:
41+
42+
1. Press *Tools & Settings*, then select *Notifications*.
43+
2. In the *System notification* pane, press *+MESSAGE*.
44+
3. Enter your message in the text box. Use the formatting tool bar to add special formats to your message text.
45+
4. Enable the "Don't show this again" option to allow users to opt out of seeing the message multiple times.
46+
5. When complete, press *Save & Publish*.
47+

docs/admin/workloads/submitting-workloads.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ To submit a workload using the UI:
2727
3. Enter a claim size, and select the units.
2828
4. Select a *Volume system*, mode from the dropdown.
2929
5. Enter the *Container path* for volume target location.
30-
6. Select a *Volume persistency.
30+
6. Select a *Volume persistency*.
3131

3232
7. In the *Data sources* pane, select a data source. If you need a new data source, press *add a new data source*. For more information, see [Creating a new data source](../../Researcher/user-interface/workspaces/create/create-ds.md) When complete press, *Create Data Source*.
3333
@@ -67,13 +67,13 @@ To submit a workload using the UI:
6767
3. Enter a claim size, and select the units.
6868
4. Select a *Volume system*, mode from the dropdown.
6969
5. Enter the *Container path* for volume target location.
70-
6. Select a *Volume persistency.
70+
6. Select a **.
7171

7272
8. (Optional) In the *Data sources* pane, select a data source. If you need a new data source, press *add a new data source*. For more information, see [Creating a new data source](../../Researcher/user-interface/workspaces/create/create-ds.md) When complete press, *Create Data Source*.
7373

7474
!!! Note
7575
* Data sources that have private credentials, which have the status of *issues found*, will be greyed out.
76-
* * Data sources can now include *Secrets*.
76+
* Data sources can now include *Secrets*.
7777

7878
9. (Optional) In the *General* pane, add special settings for your training (optional):
7979

@@ -95,13 +95,13 @@ To submit a workload using the UI:
9595
3. Enter a claim size, and select the units.
9696
4. Select a *Volume system*, mode from the dropdown.
9797
5. Enter the *Container path* for volume target location.
98-
6. Select a *Volume persistency.
98+
6. Select a **.
9999

100100
4. (Optional) In the *Data sources* pane, select a data source. If you need a new data source, press *add a new data source*. For more information, see [Creating a new data source](../../Researcher/user-interface/workspaces/create/create-ds.md) When complete press, *Create Data Source*.
101101

102102
!!! Note
103103
* Data sources that have private credentials, which have the status of *issues found*, will be greyed out.
104-
* * Data sources can now include *Secrets*.
104+
* Data sources can now include *Secrets*.
105105

106106
5. (Optional) In the *General* pane, add special settings for your training (optional):
107107

docs/developer/cluster-api/submit-yaml.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
You can use YAML to submit Workloads directly to Run:ai. Below are examples of how to create training, interactive and inference workloads via YAML.
55

6-
For details on YAML parameters, see [YAML Reference](Workload-YAML-Reference-v1.pdf)
6+
For details on YAML parameters, see the [YAML Reference](Workload-YAML-Reference-v1.pdf).
77

88
## Submit Workload Example
99

docs/home/whats-new-2-18.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: Version 2.18
3+
summary: This article describes new features and functionality in the version.
4+
authors:
5+
- Jamie Weider
6+
- Jason Novich
7+
date: 2024-June-14
8+
---
9+
10+
## Release Content - June 30, 2024
11+
12+
* [Deprecation notifications](#deprecation-notifications)
13+
* [Breaking changes](#breaking-changes)
14+
15+
### Researcher
16+
17+
#### Jobs, Workloads, and Workspaces
18+
19+
* <!-- Run-14732/Run-14733 Add backoff limit to workspace & standard training -->Added backoff limit functionality to Training and Workspace workloads in the UI. The backoff limit is the maximum number of retry attempts for failed workloads. After reaching the limit, the workload's status will change to `Failed`.
20+
21+
* <!-- RUN-18944/RUN-18945 Changing "Auto-deletion" default and presentation of the default value in the UI -->Updated *Auto-deletion time* default value from **never** to **30 days**. The *Auto-deletion time* is determined when any Run:ai workload reaches a a completed, or failed status will be automatically deleted (including logs). This change only affects new or cloned workloads.
22+
23+
* <!-- RUN-16917/RUN-19363 move to top Expose secrets in workload submission -->Added new *Data sources* of type *Secret* to workload form. *Data sources* of type *Secret* are used to hide 3rd party access credentials when submitting workloads. For more information, see [Submitting Workloads](../admin/workloads/submitting-workloads.md#how-to-submit-a-workload).
24+
25+
* <!-- RUN-16830/RUN-16831 - Graphs & special metrics for inference -->Added new graphs for *Inference* workloads. The new graphs provide more information for *Inference* workloads to help analyze performance of the workloads. For more information, see [Workloads View](../admin/workloads/README.md#workloads-view).
26+
27+
* <!-- TODO add link to doc when ready - get approval for text RUN-16805/RUN-17416 - Provide latency-based metric for autoscaling for requests -->Added latency metric for autoscaling. This feature is used to set a target threshold for the response time of requests. This will adjust the number of applications to keep the response time below that threshold.
28+
29+
* <!-- TODO Add to inference doc models explanation after autoscaling. RUN-16872/RUN-18526 Separating ChatUi from model in favor of coherent autoscaling -->Improved autoscaling for ChatUi models. Run:ai has improved autoscaling performance with ChatI models by adding them to *Environments*. ChatUi is an addition to inference workloads and is not mandatory for all types of workloads.
30+
31+
<!-- TODO add this as a section to the "models catalog" doc - wait for release from Lior RUN-16806/RUN-16807 - Hugging face integration Added Hugging Face catalog integration in inference workloads. Run:ai has added Hugging Face integration directly to the inference workload form, providing the ability to add models and data sets directly from the Hugging Face catalog. Hugging Face is a ML platform that helps users build, deploy and train machine learning models. It provides the infrastructure to demo, run and deploy artificial intelligence (AI) in live applications. Users can also browse through models and data sets that other people have uploaded. For more information on how Hugging Face is integrated, see [Hugging Face](link to hugging face in the models doc). -->
32+
33+
#### Command Line Interface
34+
35+
* <!-- RUN-14715/RUN-16337 - CLI V2 -->Added an improved researcher focused Command Line Interface (CLI). The improved CLI brings usability enhancements for researcher which include:
36+
37+
* Support multiple clusters
38+
* Self upgrade
39+
* Interactive mode
40+
* Align CLI to be data consistent with UI and API
41+
* Improved usability and performance
42+
43+
This is an early access feature available for customers to use; however be aware that there may be functional gaps versus the legacy CLI.
44+
For more information about installing and using the Improved CLI, see [Improved CLI](../Researcher/cli-reference/new-cli/runai.md).
45+
46+
#### GPU memory swap
47+
48+
* <!-- TODO verify link to doc post merge to page RUN-12615/RUN-12616 -->Added new GPU to CPU memory swap. To ensure efficient usage of an organization’s resources, Run:ai provides multiple features on multiple layers to help administrators and practitioners maximize their existing GPUs resource utilization. Run:ai’s GPU memory swap feature helps administrators and AI practitioners to further increase the utilization of existing GPU HW by improving GPU sharing between AI initiatives and stakeholders. This is done by expending the GPU physical memory to the CPU memory which is typically an order of magnitude larger than that of the GPU. For more information see, [GPU Memory Swap](../Researcher/scheduling/gpu-memory-swap.md).
49+
50+
#### YAML Workload Reference table
51+
52+
* <!-- RUN-17487/RUN-17656 -->Added a new YAML reference document that contains the value types and workload YAML references. Each table contains the field name, its description and the supported Run:ai workload types. The YAML field details contains information on the value type and currently available example workload snippets. For more information see, [YAML Reference](../developer/cluster-api/submit-yaml.md) PDF.
53+
54+
### Run:ai Administrator
55+
56+
#### Data Sources
57+
58+
* <!-- RUN-16758/RUN-18432 - Data volumes -->Added *Data Volumes* new feature. Data Volumes are snapshots of datasets stored in Kubernetes Persistent Volume Claims (PVCs). They act as a central repository for training data, and offer several key benefits.
59+
60+
* Managed with dedicated permissions&mdash;Data Admins, a new role within Run.ai, have exclusive control over data volume creation, data population, and sharing.
61+
* Shared between multiple scopes&mdash;unlike other Run:ai data sources, data volumes can be shared across projects, departments, or clusters. This promotes data reuse and collaboration within your organization.
62+
* Coupled to workloads in the submission process&mdash;similar to other Run:ai data sources, Data volumes can be easily attached to AI workloads during submission, specifying the data path within the workload environment.
63+
64+
For more information, see [Data Volumes](../developer/admin-rest-api/data-volumes.md).
65+
66+
* <!-- RUN-16917/RUN-19363 Expose secrets in workload submission -->Added new data source of type *Secret*. Run:ai now allows you to configure a *Credential* (Secret) as a data source. A *Data source* of type *Secret* is best used in workloads so that access to 3rd party interfaces and storage used in containers keep access credentials hidden. For more information, see [Secrets as a data source](../Researcher/user-interface/workspaces/create/create-ds.md#create-a-secret-as-data-source).
67+
68+
#### Credentials
69+
70+
* <!-- RUN-16917/RUN-19363 Expose secrets in workload submission -->Added new *Generic secret* to the *Credentials*. *Credentials* had been used only for access to data sources (S3, Git, etc.). However, AI practitioners need to use secrets to access sensitive data (interacting with 3rd party APIs, or other services) without having to put their credentials in their source code. *Generic secrets* are best used as a data source of type *Secret* so that they can be used in containers to keep access credentials hidden. For configuration information, see [Generic secret](../admin/admin-ui-setup/credentials-setup.md#generic-secret).
71+
72+
#### SSO
73+
74+
* <!-- RUN-16859/RUN-16860-->Added support for SSO using OpenShift v4 (OIDC based). When using OpenShift, you must first define OAuthClient which interacts with OpenShift's OAuth server to authenticate users and request access tokens. For more information, see [Single Sign-On](../admin/runai-setup/authentication/sso/).
75+
76+
* <!-- RUN-16788/RUN-16866 - OIDC Scopes -->Added OIDC scopes to authentication requests. OIDC Scopes are used to specify what access privileges are being requested for access tokens. The scopes associated with the access tokens determine what resource are available when they are used to access OAuth 2.0 protected endpoints. Protected endpoints may perform different actions and return different information based on the scope values and other parameters used when requesting the presented access token. For more information, see [UI configuration](../admin/runai-setup/authentication/sso/#step-1-ui-configuration).
77+
78+
#### Ownership protection
79+
80+
* <!-- RUN-19098/RUN-19557 Need to add link -->Added new ownership protection feature. Run:ai *Ownership Protection* ensures that only authorized users can delete or modify workloads. This feature is designed to safeguard important jobs and configurations from accidental or unauthorized modifications by users who did not originally create the workload. For configuration information, see your Run:ai representative.
81+
82+
#### System notifications
83+
84+
* <!-- RUN-12796/ RUN-20001 - Notifications infrastructure at the Control Plane -->Added new system notifications feature. Email Notifications sends alerts for critical workload life cycle changes empowering data scientists to take necessary actions and prevent delays.
85+
86+
* System administrators will need to configure the email notifications. For more information, see [System notifications](../admin/runai-setup/notifications/notifications.md).
87+
* AI Practitioners will need to setup the types of notifications they want to receive. For more information, see [Email notifications](../Researcher/best-practices/researcher-notifications.md).
88+
89+
## Deprecation Notifications
90+
91+
Deprecation notifications allow you to plan for future changes in the Run:ai Platform.
92+
93+
### Feature deprecations
94+
95+
Deprecated features will be available for **two** versions ahead of the notification. For questions, see your Run:ai representative.
96+
97+
<!-- * Command Line Interface (CLI)&mdash;from cluster version 2.18 and higher, the *Legacy CLI* is deprecated. The *Legacy CLI* is still available for use on clusters that are 2.18 or higher, but it is recommended that you use the new *Improved CLI*. -->
98+
99+
### API support and endpoint deprecations
100+
101+
The endpoints and parameters specified in the API reference are the ones that are officially supported by Run:ai. For more information about Run:ai's API support policy and deprecation process, see [Developer overview](../developer/overview-developer.md#api-support).
102+
103+
#### Deprecated APIs and API fields
104+
105+
##### Departments API
106+
107+
| Deprecated | Replacement |
108+
| --- | --- |
109+
| /v1/k8s/clusters/{clusterId}/departments | /api/v1/org-unit/departments |
110+
| /v1/k8s/clusters/{clusterId}/departments/{department-id} | /api/v1/org-unit/departments/{departmentId} |
111+
| /v1/k8s/clusters/{clusterId}/departments/{department-id} | /api/v1/org-unit/departments/{departmentId}+PUT/PATCH /api/v1/org-unit/departments/{departmentId}/resources |
112+
113+
##### Projects APi
114+
115+
| Deprecated | Replacement |
116+
| --- | --- |
117+
| /v1/k8s/clusters/{clusterId}/projects | /api/v1/org-unit/projects |
118+
| /v1/k8s/clusters/{clusterId}/projects/{id} | /api/v1/org-unit/projects/{projectId} |
119+
| /v1/k8s/clusters/{clusterId}/projects/{id} | /api/v1/org-unit/projects/{projectId} + /api/v1/org-unit/projects/{projectId}/resources |
120+
121+
## Breaking changes
122+
123+
Breaking changes notifications allow you to plan around potential changes that may interfere your current workflow when interfacing with the Run:ai Platform.

0 commit comments

Comments
 (0)