Skip to content

Commit 278473c

Browse files
RUN-16833 updated procedure
1 parent 31c1552 commit 278473c

File tree

1 file changed

+30
-13
lines changed

1 file changed

+30
-13
lines changed

docs/admin/workloads/submitting-workloads.md

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -116,36 +116,53 @@ To submit a workload using the UI:
116116
117117
When you select *Model*:
118118

119-
1. Select a model from the tiles. Use the search box to find a model that is not listed. If you can't find the model, see your system administrator.
119+
1. Select a catalog. Choose from *Run:ai* or *Hugging Face*.
120+
1. If you choose *Run:ai*, select a model from the tiles. Use the search box to find a model that is not listed. If you can't find the model, see your system administrator.
121+
2. If you choose *Hugging Face*, go to the next step.
120122
2. In the *Inference name* field, enter a name for the workload.
121-
3. In the *Compute resource* field, select a compute resource from the tiles.
122-
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference. Then select *Never* or *After one minute of inactivity* to set when the replicas should be automatically scaled down to zero.
123-
2. In the *Nodes* field, change the order of priority of the node pools, or add a new node pool to the list.
124-
4. When complete, press *Create inference*.
123+
3. In the *Credentials* field, enter the token to access the model catalog.
124+
4. If you selected *Hugging Face*, enter the name of the model in the *Model Name* section. This will not appear if you selected *Run:ai*.
125+
5. In the *Compute resource* field, select a compute resource from the tiles.
126+
127+
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference.
128+
2. In the *Set conditions for creating a new replica* section, use the drop down to select from `Throughput (Requests/sec)`, `Latency (milliseconds)`, or `Concurrency (Requests/sec)`. Then set the value. (default = 100) This section will only appear if you have 2 or more set as the maximum.
129+
3. In the *Set when replicas should be automatically scaled down to zero* section, from the drop down select *Never*, *After one, five, 15 or 30 minutes of inactivity*.
130+
131+
!!! Note
132+
When automatic scaling to zero is enabled, the minimum number of replicas is 0.
133+
134+
4. In the *Nodes* field, change the order of priority of the node pools, or add a new node pool to the list.
135+
6. When complete, press *Create inference*.
125136

126137
When you select *Custom*:
127138

128-
1. In the *Inference name* field, enter a name for the workload.
129-
2. In the *Environment* field, select an environment. Use the search box to find an environment that is not listed. If you can't find an environment, press *New environment* or see your system administrator.
139+
7. In the *Inference name* field, enter a name for the workload.
140+
8. In the *Environment* field, select an environment. Use the search box to find an environment that is not listed. If you can't find an environment, press *New environment* or see your system administrator.
130141
1. In the *Set the connection for your tool(s)* pane, choose a tool for your environment (if available).
131142
2. In the *Runtime settings* field, Set commands and arguments for the container running in the pod. (optional)
132143
3. In the *Environment variable* field, you can set one or more environment variables. (optional)
133-
3. In the *Compute resource* field, select a compute resource from the tiles. Use the search box to find a compute resource that is not listed. If you can't find an environment, press *New compute resource* or see your system administrator.
134-
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference. Then select *Never* or *After one minute of inactivity* to set when the replicas should be automatically scaled down to zero.
135-
2. In the *Nodes* field, change the order of priority of the node pools, or add a new node pool to the list.
136-
4. In the *Data sources* field, add a *New data source*. (optional)
144+
9. In the *Compute resource* field, select a compute resource from the tiles. Use the search box to find a compute resource that is not listed. If you can't find an environment, press *New compute resource* or see your system administrator.
145+
146+
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference.
147+
2. In the *Set conditions for creating a new replica* section, use the drop down to select from `Throughput (Requests/sec)`, `Latency (milliseconds)`, or `Concurrency (Requests/sec)`. Then set the value. (default = 100) This section will only appear if you have 2 or more set as the maximum.
148+
3. In the *Set when replicas should be automatically scaled down to zero* section, from the drop down select *Never*, *After one, five, 15 or 30 minutes of inactivity*.
149+
150+
!!! Note
151+
When automatic scaling to zero is enabled, the minimum number of replicas is 0.
152+
153+
10. In the *Data sources* field, add a *New data source*. (optional)
137154

138155
!!! Note
139156
140157
* Data sources that are not available will be greyed out.
141158
* Assets that are cluster syncing will be greyed out.
142159
* Only PVC, Git, and ConfigMap resources are supported.
143160

144-
5. In the *General* field you can:
161+
11. In the *General* field you can:
145162
1. Add an *Auto-deletion* time. This sets the timeframe between inference completion/failure and auto-deletion. (optional)
146163
2. Add one or more *Annotation*. (optional)
147164
3. Add one or more *Labels*. (optional)
148-
6. When complete, press *Create inference*.
165+
12. When complete, press *Create inference*.
149166

150167
## Workload Policies
151168

0 commit comments

Comments
 (0)