You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The notifications service listens to events on the Kubernetes cluster and passes notifications of those events via email. The service can be configured to send the notifications to one or more pre configured email addresses, or to the email address of the user that submitted the workload.
4
+
5
+
Note: In order to send notifications dynamically to the user who submitted the workload, the user should be logged in to the Run:ai UI or CLI.
6
+
7
+
The service can also be configured using a regular expression to send notifications only for specific namespaces on the cluster. This enables notification only for specific Run:ai projects. The default configuration sends notifications for all the namespaces starting with `runai-`.
8
+
9
+
## Prerequisites
10
+
11
+
1. The service should be installed on each cluster used with Run:ai. The installation will be done separately from the Run:ai cluster installation using a new helm chart.
12
+
2. As a part of the installation, the customer should provide their SMTP server address as well as credentials for it.
13
+
14
+
## Available notifications
15
+
16
+
Configure the notifications service to send events using the relevant `kind` and event `reason`.
17
+
The following Run:ai notifications are available:
18
+
19
+
|Event|Kind|Reason|Description|Additional info|
20
+
|:----|:----|:----|:----|:----|
21
+
|Pod scheduled|`Pod`|`Scheduled`|a pod was scheduled on a node|Pod, Job, Project, Namespace, User|
22
+
|Pod evicted|`PodGroup`|`Evict`|a pod was evicted to make room for another pod with higher priority, or to reclaim resources that belong to other project or department|Pod, Job, Project, Namespace, User|
23
+
|Pod unschedulable|`Pod`|`Unschedulable`|a pod was determined as unschedulable and couldn't be scheduled on any node in the cluster| Pod, Job, Project, Namespace, User|
24
+
|Failed scheduling pod|`Pod`|`FailedScheduling`|binding a pod to a node failed| Pod, Job, Project, Namespace, User|
25
+
26
+
!!! Tip
27
+
You can configure the notifications service to send event messages about additional Kubernetes events using the relevant `kind` and event `reason`.
28
+
<!--
29
+
The following table shows the expected messages for each event:
30
+
31
+
|Event| Message |
32
+
|--|--|
33
+
| Pod scheduled | Successfully assigned `namespace`/`pod` to `node`.|
34
+
| Pod evicted | Examples of messages explaining why the pod was evicted: <br /><br />Eviction due to priority within same namespace:<br /> Job `namespace`/`pod` was preempted by a job `namespace`/`pod` which has higher priority.<br /><br />Eviction due to reclaim from queue which is over-quota:<br />Job `namespace`/`pod` was reclaimed by job `namespace`/`podGroup`. The reclaimed project uses `x` GPUs with a quota of `y` GPUs. <br /><br />Eviction for consolidation:<br /> Pod `namespace`/`pod` was removed for bin packing. |
35
+
| Pod unschedulable |Message explaining different reasons for scheduler not being able to schedule on different nodes. <br /> (for example "All nodes are unavailable: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: test}. 2 node(s) didn't have enough resource: GPUs. 2 node(s) didn't have enough resource: MilliCPUs.")|
36
+
| Failed scheduling pod | The error returned from Kubernetes API server, which usually indicates an error in the scheduler or in the cluster. |
37
+
-->
38
+
39
+
## Installation
40
+
41
+
Install the notification service using the following commands:
42
+
43
+
1. Set the helm repo to point to the notification service package using the following command:
The notification service is configured using a `configmap` file. The following is an example of a `configmap` file. Each of the tables below references a section in the `configmap` file.
67
+
68
+
<!-- Need to better understand this.
69
+
!!! Note:
70
+
You can change the service configuration values after deployment. Edit the config map and then rerun the `helm install` command above with the `-f` flag.
71
+
-->
72
+
73
+
### `service` configuration
74
+
75
+
This section defines the number of events that will be sent by the service. Use the following table to configure options in the `service` section of the `configmap` file.
76
+
77
+
|Component|Field|Description|Default|
78
+
|:----|:----|:----|:----|
79
+
|`service`|`service.concurrent_limit`|maximum number of events the service will handle in parallel|50|
80
+
|`service`|`service.cached_events`|queue size for events before blocking the listener|1000|
81
+
82
+
### `listener` configuration
83
+
84
+
This section defines the objects and events that the service will listen to and send as notifications. Use the following table to configure options in the `listener` section of the `configmap` file.
85
+
86
+
| Component | Field | Description | Default |
87
+
| --- | --- | --- | --- |
88
+
|`kubelistener`|`listener.relevant_objects`| white list of Kubernetes components for notifications | relevant_objects: <br> `kind:` <br> `Podreasons:UnschedulableScheduled` <br><br> `kind:` <br>`PodGroupreasons: - Evict`|
89
+
|`kubelistener`|`listener.relevant_namespaces`| white list of namespaces that the service should listen to for events (regex) |`runai-.*`|
90
+
91
+
### `enrich` configuration
92
+
93
+
!!! Note
94
+
This section of the `configmap` is for internal use only. Keep the default values.
|`Email`|`notify.email.user` (M)|SMTP server user login|user|
113
+
|`Email`|`notify.email.password` (M)|SMTP server user's password |password|
114
+
|`Email`|`notify.email.direct_notifications` (together with Recipients)|when set to true, email notifications will be sent dynamically to the user who submitted the workload|false|
115
+
|`Email`|`notify.email.recipients` (together with Direct Notifications)|additional email address recipients list for all the events - broadcast|Empty list|
116
+
117
+
**(M)** = mandatory to include in the `configmap` file.
118
+
119
+
### Example `configmap` file
120
+
121
+
The following file is an example of a configmap file for the notification service.
Copy file name to clipboardExpand all lines: docs/admin/runai-setup/authentication/sso.md
+17-26Lines changed: 17 additions & 26 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Single Sign-On (SSO) is an authentication scheme that allows a user to log in with a single ID to other, independent, software systems. SSO solves security issues involving multiple user/password data entries, multiple compliance schemes, etc.
4
4
5
-
Run:ai supports SSO using the [SAML 2.0](https://en.wikipedia.org/wiki/Security_Assertion_Markup_Language){target=_blank} protocol and Open ID Connect (OIDC).
5
+
Run:ai supports SSO using the [SAML 2.0](https://en.wikipedia.org/wiki/Security_Assertion_Markup_Language){target=_blank} protocol and Open ID Connect or [OIDC](https://openid.net/developers/how-connect-works/){target=_blank}.
6
6
7
7
!!! Caution
8
8
Single sign-on is only available with SaaS installations where the tenant has been created post-January 2022 or any Self-hosted installation of release 2.0.58 or later. If you are using single sign-on with older versions of Run:ai, please contact Run:ai customer support
@@ -13,8 +13,7 @@ Run:ai supports SSO using the [SAML 2.0](https://en.wikipedia.org/wiki/Security_
13
13
14
14
## SAML Prerequisites
15
15
16
-
***XML Metadata**—you must have an *XML Metadata file* retrieved from your IdP. Upload the file to a web server such that you will have a URL to the file. The URL must have the *XML* file extension. For example, to connect using Google, you must create a custom SAML App [here](https://admin.google.com/ac/apps/unified){target=_blank}, download the Metadata file, and upload it to a web server.
17
-
***Organization Name**—you must have a Run:ai*Organization Name*. This is the name that appears on the top right of the Run:ai user interface.
16
+
**XML Metadata**—you must have an *XML Metadata file* retrieved from your IdP. Upload the file to a web server such that you will have a URL to the file. The URL must have the *XML* file extension. For example, to connect using Google, you must create a custom SAML App [here](https://admin.google.com/ac/apps/unified){target=_blank}, download the Metadata file, and upload it to a web server.
18
17
19
18
## OIDC Prerequisites
20
19
@@ -26,15 +25,15 @@ Run:ai supports SSO using the [SAML 2.0](https://en.wikipedia.org/wiki/Security_
26
25
27
26
You can configure your IdP to map several IdP attributes:
28
27
29
-
| IdP attribute | Run:ai required name | Description |
28
+
| IdP attribute |Default Run:ai name | Description |
30
29
|--|--|--|
31
-
| User email | email |**(Mandatory)**`e-mail` is the user identifier with Run:ai. |
32
-
| User role groups | GROUPS | (Optional) If exists, allows assigning Run:ai role groups via the IdP. The IdP attribute must be of a type of list of strings. See more below |
33
-
| Linux User ID | UID (configurable)| (Optional) If exists in IdP, allows Researcher containers to start with the Linux User `UID`. Used to map access to network resources such as file systems to users. The IdP attribute must be of integer type. |
34
-
| Linux Group ID | GID (configurable)| (Optional) If exists in IdP, allows Researcher containers to start with the Linux Group `GID`. The IdP attribute must be of integer type. |
35
-
| Linux Supplementary Groups | SUPPLEMENTARYGROUPS (configurable)| (Optional) If exists in IdP, allows Researcher containers to start with the relevant Linux supplementary groups. The IdP attribute must be of a type of list of integers. |
36
-
| User first name | firstName (configurable)| (Optional) Used as the first name showing in the Run:ai user interface. |
37
-
| User last name | lastName (configurable)| (Optional) Used as the last name showing in the Run:ai user interface |
30
+
| User email | email (cannot be changed) |**(Mandatory)**`e-mail` is the user identifier with Run:ai. |
31
+
| User role groups | GROUPS | (Optional) If exists, allows assigning Run:ai role groups via the IdP. The IdP attribute must be of a type of list of strings. See more below |
32
+
| Linux User ID | UID | (Optional) If exists in IdP, allows Researcher containers to start with the Linux User `UID`. Used to map access to network resources such as file systems to users. The IdP attribute must be of integer type. |
33
+
| Linux Group ID | GID | (Optional) If exists in IdP, allows Researcher containers to start with the Linux Group `GID`. The IdP attribute must be of integer type. |
34
+
| Linux Supplementary Groups | SUPPLEMENTARYGROUPS | (Optional) If exists in IdP, allows Researcher containers to start with the relevant Linux supplementary groups. The IdP attribute must be of a type of list of integers. |
35
+
| User first name | firstName | (Optional) Used as the first name showing in the Run:ai user interface. |
36
+
| User last name | lastName | (Optional) Used as the last name showing in the Run:ai user interface |
38
37
39
38
### Example attribute mapping for Google Suite
40
39
@@ -54,12 +53,9 @@ You can configure your IdP to map several IdP attributes:
54
53
For `Saml 2`:
55
54
56
55
1. In the `Metadata XML Url` field, enter the URL to the XML Metadata file.
57
-
2. In the `GID` field, enter the GID.
58
-
3. In the `GROUPS` field, enter the groups.
59
-
4. In the `SUPPLEMENTARYGROUPS` field, enter the supplementary groups.
60
-
5. In the `UID` field, enter the UID.
61
-
6. In the `Logout uri` field, enter the desired URL logout page. If left empty, you will be redirected to the Run:ai portal.
62
-
7. Press `Save`.
56
+
2. Find your identity provider's attribute names for `GID`, `GROUPS`, `SUPPLEMENTARYGROUPS` and `UID`. If they are not in line with the Run:ai defaults described in the table above, you can change them here.
57
+
3. In the `Logout uri` field, enter the desired URL logout page. If left empty, you will be redirected to the Run:ai portal.
58
+
4. Press `Save`.
63
59
64
60
For `Open ID Connect`:
65
61
@@ -68,12 +64,9 @@ For `Open ID Connect`:
68
64
1. In the `Discovery Document URL` field, enter the URL to the discovery document.
69
65
2. In the `Client ID` field, enter the client ID.
70
66
3. In the `Client Secret` field, enter the client secret.
71
-
4. In the `GID` field, enter the GID.
72
-
5. In the `GROUPS` field, enter the groups.
73
-
6. In the `SUPPLEMENTARYGROUPS` field, enter the supplementary groups.
74
-
7. In the `UID` field, enter the UID.
75
-
8. In the `Logout uri` field, enter the desired URL logout page. If left empty, you will be redirected to the Run:ai portal.
76
-
9. Press `Save`.
67
+
4. Find your identity provider's attribute names for `GID`, `GROUPS`, `SUPPLEMENTARYGROUPS` and `UID`. If they are not in line with the Run:ai defaults described in the table above, you can change them here.
68
+
5. In the `Logout uri` field, enter the desired URL logout page. If left empty, you will be redirected to the Run:ai portal.
69
+
6. Press `Save`.
77
70
78
71
Once you press `Save` you will receive a `Redirect URI` and an `Entity ID`. Both values must be set on the IdP side.
79
72
@@ -86,12 +79,10 @@ Test Connectivity to Administration User Interface:
86
79
87
80
* Using an incognito browser tab and open the Run:ai user interface.
88
81
* Select the `Login with SSO` button.
89
-
* Provide the `Organization name` obtained above.
90
82
* You will be redirected to the IdP login page. Use the previously entered *Administrator* email* to log in.
91
83
92
84
### Troubleshooting
93
-
94
-
The SSO log in can be separated into two parts:
85
+
The SSO login can be separated into two parts:
95
86
96
87
1. Run:ai redirects to the IdP (for example, Google) for login using a *SAML Request*.
97
88
2. Upon successful login, IdP redirects back to Run:ai with a *SAML Response*.
Copy file name to clipboardExpand all lines: docs/admin/runai-setup/self-hosted/k8s/prerequisites.md
+11-1Lines changed: 11 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,17 @@ See Run:ai Cluster prerequisites [Kubernetes](../../cluster-setup/cluster-prereq
41
41
42
42
The Run:ai control plane operating system prerequisites are identical.
43
43
44
-
The Run:ai control-plane requires a default storage class to create persistent volume claims for Run:ai storage. The storage class, as per Kubernetes standards, controls the reclaim behavior: whether the Run:ai persistent data is saved or deleted when the Run:ai control plane is deleted.
44
+
The Run:ai control-plane requires a __default storage class__ to create persistent volume claims for Run:ai storage. The storage class, as per Kubernetes standards, controls the reclaim behavior: whether the Run:ai persistent data is saved or deleted when the Run:ai control plane is deleted.
45
+
46
+
47
+
!!! Note
48
+
For a simple (nonproduction) storage class example see [Kubernetes Local Storage Class](https://kubernetes.io/docs/concepts/storage/storage-classes/#local){target=_blank}. The storage class will set the directory `/opt/local-path-provisioner` to be used across all nodes as the path for provisioning persistent volumes.
0 commit comments