You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/admin/workloads/submitting-workloads.md
+30-13Lines changed: 30 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -116,36 +116,53 @@ To submit a workload using the UI:
116
116
117
117
When you select *Model*:
118
118
119
-
1. Select a model from the tiles. Use the search box to find a model that is not listed. If you can't find the model, see your system administrator.
119
+
1. Select a catalog. Choose from *Run:ai* or *Hugging Face*.
120
+
1. If you choose *Run:ai*, select a model from the tiles. Use the search box to find a model that is not listed. If you can't find the model, see your system administrator.
121
+
2. If you choose *Hugging Face*, go to the next step.
120
122
2. In the *Inference name* field, enter a name for the workload.
121
-
3. In the *Compute resource* field, select a compute resource from the tiles.
122
-
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference. Then select *Never* or *After one minute of inactivity* to set when the replicas should be automatically scaled down to zero.
123
-
2. In the *Nodes* field, change the order of priority of the node pools, or add a new node pool to the list.
124
-
4. When complete, press *Create inference*.
123
+
3. In the *Credentials* field, enter the token to access the model catalog.
124
+
4. If you selected *Hugging Face*, enter the name of the model in the *Model Name* section. This will not appear if you selected *Run:ai*.
125
+
5. In the *Compute resource* field, select a compute resource from the tiles.
126
+
127
+
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference.
128
+
2. In the *Set conditions for creating a new replica* section, use the drop down to select from `Throughput (Requests/sec)`, `Latency (milliseconds)`, or `Concurrency (Requests/sec)`. Then set the value. (default = 100) This section will only appear if you have 2 or more set as the maximum.
129
+
3. In the *Set when replicas should be automatically scaled down to zero* section, from the drop down select *Never*, *After one, five, 15 or 30 minutes of inactivity*.
130
+
131
+
!!! Note
132
+
When automatic scaling to zero is enabled, the minimum number of replicas is 0.
133
+
134
+
4. In the *Nodes* field, change the order of priority of the node pools, or add a new node pool to the list.
135
+
6. When complete, press *Create inference*.
125
136
126
137
When you select *Custom*:
127
138
128
-
1. In the *Inference name* field, enter a name for the workload.
129
-
2. In the *Environment* field, select an environment. Use the search box to find an environment that is not listed. If you can't find an environment, press *New environment* or see your system administrator.
139
+
7. In the *Inference name* field, enter a name for the workload.
140
+
8. In the *Environment* field, select an environment. Use the search box to find an environment that is not listed. If you can't find an environment, press *New environment* or see your system administrator.
130
141
1. In the *Set the connection for your tool(s)* pane, choose a tool for your environment (if available).
131
142
2. In the *Runtime settings* field, Set commands and arguments for the container running in the pod. (optional)
132
143
3. In the *Environment variable* field, you can set one or more environment variables. (optional)
133
-
3. In the *Compute resource* field, select a compute resource from the tiles. Use the search box to find a compute resource that is not listed. If you can't find an environment, press *New compute resource* or see your system administrator.
134
-
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference. Then select *Never* or *After one minute of inactivity* to set when the replicas should be automatically scaled down to zero.
135
-
2. In the *Nodes* field, change the order of priority of the node pools, or add a new node pool to the list.
136
-
4. In the *Data sources* field, add a *New data source*. (optional)
144
+
9. In the *Compute resource* field, select a compute resource from the tiles. Use the search box to find a compute resource that is not listed. If you can't find an environment, press *New compute resource* or see your system administrator.
145
+
146
+
1. In the *Replica autoscaling* section, set the minimum and maximum replicas for your inference.
147
+
2. In the *Set conditions for creating a new replica* section, use the drop down to select from `Throughput (Requests/sec)`, `Latency (milliseconds)`, or `Concurrency (Requests/sec)`. Then set the value. (default = 100) This section will only appear if you have 2 or more set as the maximum.
148
+
3. In the *Set when replicas should be automatically scaled down to zero* section, from the drop down select *Never*, *After one, five, 15 or 30 minutes of inactivity*.
149
+
150
+
!!! Note
151
+
When automatic scaling to zero is enabled, the minimum number of replicas is 0.
152
+
153
+
10. In the *Data sources* field, add a *New data source*. (optional)
137
154
138
155
!!! Note
139
156
140
157
* Data sources that are not available will be greyed out.
141
158
* Assets that are cluster syncing will be greyed out.
142
159
* Only PVC, Git, and ConfigMap resources are supported.
143
160
144
-
5. In the *General* field you can:
161
+
11. In the *General* field you can:
145
162
1. Add an *Auto-deletion* time. This sets the timeframe between inference completion/failure and auto-deletion. (optional)
0 commit comments