Skip to content

Commit 3b51f48

Browse files
Merge branch 'v2.17' into v2.17-RUN-15154-inference-workloads
2 parents 613e9cb + 6bf8771 commit 3b51f48

26 files changed

+505
-297
lines changed

docs/Researcher/Walkthroughs/quickstart-inference.md

Lines changed: 13 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,44 +2,43 @@
22

33
## Introduction
44

5-
Machine learning (ML) inference is the process of running live data points into a machine-learning algorithm to calculate an output.
5+
Machine learning (ML) inference is the process of running live data points into a machine-learning algorithm to calculate an output.
66

7-
With Inference, you are taking a trained _Model_ and deploying it into a production environment. The deployment must align with the organization's production standards such as average and 95% response time as well as up-time.
7+
With Inference, you are taking a trained *Model* and deploying it into a production environment. The deployment must align with the organization's production standards such as average and 95% response time as well as up-time.
88

9-
## Prerequisites
9+
## Prerequisites
1010

1111
To complete this Quickstart you must have:
1212

13-
* Run:ai software installed on your Kubernetes cluster. See: [Installing Run:ai on a Kubernetes Cluster](../../admin/runai-setup/installation-types.md). There are additional prerequisites for running inference. See [cluster installation prerequisites](../../admin/runai-setup/cluster-setup/cluster-prerequisites.md#inference) for more information.
13+
* Run:ai software installed on your Kubernetes cluster. See: [Installing Run:ai on a Kubernetes Cluster](../../admin/runai-setup/installation-types.md). There are additional prerequisites for running inference. See [cluster installation prerequisites](../../admin/runai-setup/cluster-setup/cluster-prerequisites.md#inference) for more information.
1414
* Run:ai CLI installed on your machine. See: [Installing the Run:ai Command-Line Interface](../../admin/researcher-setup/cli-install.md)
15-
* You must have _ML Engineer_ access rights. See [Adding, Updating and Deleting Users](../../admin/admin-ui-setup/admin-ui-users.md) for more information.
15+
* You must have *ML Engineer* access rights. See [Adding, Updating and Deleting Users](../../admin/admin-ui-setup/admin-ui-users.md) for more information.
1616

1717
## Step by Step Walkthrough
1818

1919
### Setup
2020

21-
* Login to the Projects area of the Run:ai user interface.
22-
* Add a Project named "team-a".
23-
* Allocate 2 GPUs to the Project.
21+
* Login to the Projects area of the Run:ai user interface.
22+
* Add a Project named "team-a".
23+
* Allocate 2 GPUs to the Project.
2424

25-
### Run an Inference Workload
25+
### Run an Inference Workload
2626

27-
* In the Run:ai user interface go to `Deployments`. If you do not see the `Deployments` section you may not have the required access control, or the inference module is disabled.
27+
* In the Run:ai user interface go to `Deployments`. If you do not see the `Deployments` section you may not have the required access control, or the inference module is disabled.
2828
* Select `New Deployment` on the top right.
2929
* Select `team-a` as a project and add an arbitrary name. Use the image `gcr.io/run-ai-demo/example-triton-server`.
3030
* Under `Resources` add 0.5 GPUs.
31-
* Under `Auto Scaling` select a minimum of 1, a maximum of 2. Use the `concurrency` autoscaling threshold method. Add a threshold of 3.
31+
* Under `Autoscaling` select a minimum of 1, a maximum of 2. Use the `concurrency` autoscaling threshold method. Add a threshold of 3.
3232
* Add a `Container port` of `8000`.
3333

34-
3534
This would start an inference workload for team-a with an allocation of a single GPU. Follow up on the Job's progress using the [Deployment list](../../admin/admin-ui-setup/deployments.md) in the user interface or by running `runai list jobs`
3635

3736
### Query the Inference Server
3837

3938
The specific inference server we just created is accepting queries over port 8000. You can use the Run:ai Triton demo client to send requests to the server:
4039

4140
* Find an IP address by running `kubectl get svc -n runai-team-a`. Use the `inference1-00001-private` Cluster IP.
42-
* Replace `<IP>` below and run:
41+
* Replace `<IP>` below and run:
4342

4443
```
4544
runai submit inference-client -i gcr.io/run-ai-demo/example-triton-client \
@@ -52,11 +51,10 @@ The specific inference server we just created is accepting queries over port 800
5251
runai logs inference-client
5352
```
5453

55-
5654
### View status on the Run:ai User Interface
5755

5856
* Open the Run:ai user interface.
59-
* Under _Deployments_ you can view the new Workload. When clicking the workload, note the utilization graphs go up.
57+
* Under *Deployments* you can view the new Workload. When clicking the workload, note the utilization graphs go up.
6058

6159
### Stop Workload
6260

@@ -66,4 +64,3 @@ Use the user interface to delete the workload.
6664

6765
* You can also create Inference deployments via API. For more information see [Submitting Workloads via YAML](../../developer/cluster-api/submit-yaml.md).
6866
* See [Deployment](../../admin/admin-ui-setup/deployments.md) user interface.
69-

docs/Researcher/cli-reference/runai-submit-dist-TF.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,10 @@ runai submit-dist tf --name distributed-job --workers=2 -g 1 \
273273
>
274274
> Mount /root/data to NFS path /public/data on NFS server nfs.example.com for read-write access.
275275
276+
#### --configmap-volume name=<name of configmap>,path=<path to mount> ...'
277+
278+
> Mount a `ConfigMap` object for use as a data volume.
279+
276280
### Network
277281

278282
#### --address `<string>`

docs/Researcher/cli-reference/runai-submit-dist-mpi.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,10 @@ You can start an unattended mpi training Job of name dist1, based on Project *te
276276
>
277277
> Mount /root/data to NFS path /public/data on NFS server nfs.example.com for read-write access.
278278
279+
#### --configmap-volume name=<name of configmap>,path=<path to mount> ...'
280+
281+
> Mount a `ConfigMap` object for use as a data volume.
282+
279283
### Network
280284

281285
#### --address `<string>`

docs/Researcher/cli-reference/runai-submit-dist-pytorch.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,10 @@ runai submit-dist pytorch --name distributed-job --workers=2 -g 1 \
280280
>
281281
> Mount /root/data to NFS path /public/data on NFS server nfs.example.com for read-write access.
282282
283+
#### --configmap-volume name=<name of configmap>,path=<path to mount> ...'
284+
285+
> Mount a `ConfigMap` object for use as a data volume.
286+
283287
### Network
284288

285289
#### --address `<string>`

docs/Researcher/cli-reference/runai-submit-dist-xgboost.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,10 @@ runai submit-dist xgboost --name distributed-job --workers=2 -g 1 \
268268
>
269269
> Mount /root/data to NFS path /public/data on NFS server nfs.example.com for read-write access.
270270
271+
#### --configmap-volume name=<name of configmap>,path=<path to mount> ...'
272+
273+
> Mount a `ConfigMap` object for use as a data volume.
274+
271275
### Network
272276

273277
#### --address `<string>`

docs/Researcher/cli-reference/runai-submit.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -346,6 +346,10 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
346346
>
347347
> Mount /root/data to NFS path /public/data on NFS server nfs.example.com for read-write access.
348348
349+
#### --configmap-volume name=<name of configmap>,path=<path to mount> ...'
350+
351+
> Mount a `ConfigMap` object for use as a data volume.
352+
349353
### Network
350354

351355
<!--

docs/Researcher/user-interface/workspaces/create/create-compute.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,13 @@ You can select one or more resources. For example, one compute resource may cons
77
!!! Note
88
Selecting resources more than the cluster can supply will result in a permanently failed workspace.
99

10+
Use the *Cluster* filter at the top of the table to see compute resources that are assigned to specific clusters.
11+
12+
!!! Note
13+
The cluster filter will be in the top bar when there are clusters that are installed with version 2.16 or lower.
14+
15+
Use the *Add filter* to add additional filters to the table.
16+
1017
## Set GPU resources
1118

1219
GPU resources can be expressed in various ways:
@@ -33,7 +40,7 @@ A CPU resource consists of cores and memory. When GPU resources are requested th
3340
To create a compute resource:
3441

3542
1. Select the `New Compute Resource` button.
36-
2. In the *Scope* pane, choose one item from the tree. The compute resource is assigned to that item and all its subsidiaries.
43+
2. In the *Scope* pane, choose a cluster, department, or project from the tree. The compute resource is assigned to that item and all its subsidiaries.
3744
3. Give the resource a meaningful name.
3845
4. In the resources pane, set the resource request.
3946
1. To add GPU resources, enter the number of GPUs to request. You can then enter the amount of GPU memory by selecting a percentage of the GPU, memory size in MB or GB, or multi-instance GPUs.

docs/Researcher/user-interface/workspaces/create/create-ds.md

Lines changed: 49 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ When you select `New Compute Resource` you will be presented with various data s
77
To create an NFS data source, provide:
88

99
* A data source name.
10-
* A Run:ai project scope which is assigned to that item and all its subsidiaries.
10+
* A Run:ai scope (cluster, department, or project) which is assigned to that item and all its subsidiaries.
1111
* An NFS server.
1212
* The path to the data within the server.
1313
* The path within the container where the data will be mounted.
@@ -19,58 +19,94 @@ The data can be set as read-write or limited to read-only permission regardless
1919
To create an PVC data source, provide:
2020

2121
* A data source name
22-
* A Run:ai project scope which is assigned to that item and all its subsidiaries.
22+
* A Run:ai scope (cluster, department, or project) which is assigned to that item and all its subsidiaries.
2323
* Select an existing PVC or create a new one by providing:
2424

25-
* A claim name
26-
* A storage class
27-
* Access mode
28-
* Required storage size
29-
* Volume system mode
25+
* A claim name
26+
* A storage class
27+
* Access mode
28+
* Required storage size
29+
* Volume system mode
3030

3131
* The path within the container where the data will be mounted.
3232

33+
You can see the status of the resources created in the [Data sources table](#data-sources-table).
34+
3335
## Create an S3 data source
3436

3537
S3 storage saves data in *buckets*. S3 is typically attributed to AWS cloud service but can also be used as a separate service unrelated to Amazon.
3638

3739
To create an S3 data source, provide
3840

3941
* A data source name
40-
* A Run:ai project scope which is assigned to that item and all its subsidiaries.
42+
* A Run:ai scope (cluster, department, or project) which is assigned to that item and all its subsidiaries.
4143
* The relevant S3 service URL server
4244
* The bucket name of the data.
4345
* The path within the container where the data will be mounted.
4446

45-
Note that an S3 data source can be public or private. For the latter option, please select the relevant credentials associated with the project to allow access to the data.
47+
An S3 data source can be public or private. For the latter option, please select the relevant credentials associated with the project to allow access to the data. S3 buckets that use credentials will have a status associated with it. For more information, see [Data sources table](#data-sources-table).
4648

4749
## Create a Git data source
4850

4951
To create a Git data source, provide:
5052

5153
* A data source name.
52-
* A Run:ai project scope which is assigned to that item and all its subsidiaries.
54+
* A Run:ai scope (cluster, department, or project) which is assigned to that item and all its subsidiaries.
5355
* The relevant repository URL.
5456
* The path within the container where the data will be mounted.
5557

56-
The Git data source can be public or private. To allow access to a private Git data source, you must select the relevant credentials associated with the project.
58+
The Git data source can be public or private. To allow access to a private Git data source, you must select the relevant credentials associated with the project. Git data sources that use credentials will have a status associated with it. For more information, see [Data sources table](#data-sources-table).
5759

5860
## Create a host path data source
5961

6062
To create a host path data source, provide:
6163

6264
* A data source name.
63-
* A Run:ai project scope which is assigned to that item and all its subsidiaries.
65+
* A Run:ai scope (cluster, department, or project) which is assigned to that item and all its subsidiaries.
6466
* The relevant path on the host.
6567
* The path within the container where the data will be mounted.
6668

6769
!!! Note
6870
The data can be limited to read-only permission regardless of any other user privileges.
6971

70-
### Download Data Sources Table
72+
## Create a ConfigMap data source
73+
74+
* A Run:ai project scope which is assigned to that item and all its subsidiaries.
75+
76+
!!! Note
77+
You can only choose a project as a scope.
78+
79+
* A data source name.
80+
* A data mount consisting of:
81+
82+
* A ConfigMap name&mdash;select from the drop down.
83+
* A target location&mdash;the path to the container.
84+
85+
## Data sources table
86+
87+
The *Data sources* table contains a column for the status of the data source. The following statuses are supported:
88+
89+
| Status | Description |
90+
| -- | -- |
91+
| **No issues found** | No issues were found when propagating the data source to the *PROJECTS*. |
92+
| **Issues found** | Failed to create the data source for some or all of the *PROJECTS*. |
93+
| **Issues found** | Failed to access the cluster. |
94+
| **Deleting** | The data source is being removed. |
95+
96+
!!! Note
97+
98+
* The *Status* column in the table shows statuses based on your level of permissions. For example, a user that has create permissions for the scope, will see statuses that are calculated from the entire scope, while users who have only view and use permissions, will only be able to see statuses from a subset of the scope (assets that they have permissions to).
99+
* The status of “-” indicates that there is no status because this asset is not cluster-syncing.
71100

72101
You can download the Data Sources table to a CSV file. Downloading a CSV can provide a snapshot history of your Data Sources over the course of time, and help with compliance tracking. All the columns that are selected (displayed) in the table will be downloaded to the file.
73102

103+
Use the *Cluster* filter at the top of the table to see data sources that are assigned to specific clusters.
104+
105+
!!! Note
106+
The cluster filter will be in the top bar when there are clusters that are installed with version 2.16 or lower.
107+
108+
Use the *Add filter* to add additional filters to the table.
109+
74110
To download the Data Sources table to a CSV:
75111

76112
1. Open *Data Sources*.

docs/Researcher/user-interface/workspaces/create/create-env.md

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,32 @@
33
To create an environment:
44

55
1. In the left menu, press *New Environment*.
6-
2. In the *Scope* pane, choose one item from the tree. The compute resource is assigned to that item and all its subsidiaries.
6+
2. In the *Scope* pane, choose a cluster, department, or project from the tree. The environment is assigned to that item and all its subsidiaries.
77
3. Enter an *Environment name*.
88
4. Enter the image URL path and an image pull policy.
99
5. Choose a supported workload type. Configure this section based on the type of workload you expect to run in this environment. Choose from:
1010

11-
* `Single node`&mdash;use for running workloads on a single node.
12-
* `Multi-node`&mdash;use for running distributed workloads on multiple nodes.
11+
* *Standard*&mdash;use for running workloads on a single node.
12+
* *Distributed*&mdash;use for running distributed workloads on multiple nodes.
1313

1414
Then choose the workload that can use the environment:
1515

16-
* `Workspace`
17-
* `Training`
18-
6. In the *Supported workload types* pane select either `Single node` or `Multi-node (Distributed)`.
16+
* *Workspace*
17+
* *Training*
18+
* *Inference*
1919

20-
1. If you selected `Single node`, select `Workspace`, or `Training` or both.
21-
2. If you selected `Multi-node (Distributed)`, select a framework from the dropdown, then select `Workspace`, or `Training` or both.
20+
If you selected *Inference*, in the *endpoint* pane, select a *Protocol* from the dropdown, then enter the *Container port*.
2221

23-
7. Select a tool from the list. You can add multiple tools by pressing *+ Tool*. Selecting a tool is optional.
22+
6. Select a tool from the list. You can add multiple tools by pressing *+ Tool*. Selecting a tool is optional.
2423

2524
Tools can be:
2625

2726
* Different applications such as Code editor IDEs (for example, VS Code), Experiment tracking (for example, Weight and Biases), visualization tools (for example, Tensor Board), and more.
2827
* Open source tools (for example, Jupyter notebook) or commercial 3rd party tools (for example,. MATLAB)
2928

29+
!!! Note
30+
Tool configuration is not supported with *Inference* environments.
31+
3032
It is also possible to set up a custom tool used by the organization.
3133

3234
For each tool, you must set the type of connection interface and port. If not set, default values are provided. The supported connection types are:
@@ -35,12 +37,14 @@ To create an environment:
3537
* External node port: A [NodePort](../../../../admin/runai-setup/config/allow-external-access-to-containers.md) exposes your application externally on every host of the cluster, access the tool using `http://<HOST_IP>:<NODEPORT>` (for example, http://203.0.113.20:30556).
3638

3739
!!! Note
38-
Selecting a tool requires configuration to be up and running. To configure a tool:
40+
Selecting a tool requires a configuration to be up and running.
41+
42+
To configure a tool:
3943

4044
* The container image needs to support the tool.
4145
* The administrator must configure a DNS record and certificate. For more information, see [Workspaces configuration](../../../../admin/runai-setup/config/allow-external-access-to-containers.md#workspaces-configuration).
4246

43-
8. Configure runtime settings with:
47+
7. Configure runtime settings with:
4448

4549
1. Commands and arguments&mdash;visible, but not editable in the workspace creation form.
4650
2. Environment variables&mdash;visible and editable in the workspace creation form.
@@ -49,22 +53,29 @@ To create an environment:
4953
!!! Note
5054
The value of an environment variable can remain empty for the researcher to fill in when creating a workspace.
5155

52-
9. Configure the security settings from:
56+
8. Configure the security settings from:
5357

5458
1. Settings in the image&mdash;security settings that come with the image file.
5559
2. Custom settings:
5660

57-
1. User ID.
58-
2. Group ID.
59-
3. Supplementary Groups.
60-
4. Values modification settings.
61+
1. User ID.
62+
2. Group ID.
63+
3. Supplementary Groups.
64+
4. Values modification settings.
6165

6266
3. Add linux capabilities.
6367

6468
## Download Environments Table
6569

6670
You can download the Environments table to a CSV file. Downloading a CSV can provide a snapshot history of your environments over the course of time, and help with compliance tracking. All the columns that are selected (displayed) in the table will be downloaded to the file.
6771

72+
Use the *Cluster* filter at the top of the table to see environments that are assigned to specific clusters.
73+
74+
!!! Note
75+
The cluster filter will be in the top bar when there are clusters that are installed with version 2.16 or lower.
76+
77+
Use the *Add filter* to add additional filters to the table.
78+
6879
To download the Environments table to a CSV:
6980

7081
1. In the left menu, press *Environments*.

0 commit comments

Comments
 (0)