You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
summary: This article describes inference workloads.
3
+
summary: This article summarizes machine learning inference workloads.
4
4
authors:
5
5
- Jason Novich
6
6
date: 2024-Mar-29
@@ -10,31 +10,31 @@ date: 2024-Mar-29
10
10
11
11
Machine learning (ML) inference is the process of running live data points into a machine-learning algorithm to calculate an output.
12
12
13
-
With Inference, you are taking a trained *Model* and deploying it into a production environment. The deployment must align with the organization's production standards such as average and 95% response time as well as up-time.
13
+
With *Inference* workloads, you are taking a trained *Model* and deploying it into a production environment. The deployment must align with the organization's production standards such as average and 95% response time as well as up-time.
14
14
15
15
## Inference and GPUs
16
16
17
-
The inference process is a subset of the original training algorithm on a single datum (for example, one sentence or one image), or a small batch. As such, GPU memory requirements are typically smaller than a full-blown Training process.
17
+
The *Inference* process is a subset of the original Training algorithm on a single datum (e.g. one sentence or one image), or a small batch. As such, GPU memory requirements are typically smaller than a full-blown Training process.
18
18
19
-
Given that, Inference lends itself nicely to the usage of Run:ai Fractions. You can, for example, run 4 instances of an Inference server on a single GPU, each employing a fourth of the memory.
19
+
Given that, *Inference* lends itself nicely to the usage of Run:ai Fractions. You can, for example, run 4 instances of an *Inference* server on a single GPU, each employing a fourth of the memory.
20
20
21
21
## Inference @Run:ai
22
22
23
-
Run:ai provides Inference services as an equal part together with any other Workload type that is available.
23
+
Run:ai provides *Inference* services as an equal part together with the other two Workload types: *Train* and *Build*.
24
24
25
-
* Inference is considered a high-priority workload as it is customer-facing. Running an Inference workload (within the Project's quota) will preempt any Run:ai Workload marked as *Training*.
25
+
**Inference* is considered a high-priority workload as it is customer-facing. Running an *Inference* workload (within the Project's quota) will preempt any Run:ai Workload marked as *Training*.
26
26
27
-
* Inference workloads will receive priority over *Train* and *Build* workloads during scheduling.
27
+
**Inference* workloads will receive priority over *Train* and *Build* workloads during scheduling.
28
28
29
-
* Inference is implemented as a Kubernetes *Deployment* object with a defined number of replicas. The replicas are load-balanced by Kubernetes so adding more replicas will improve the overall throughput of the system.
29
+
**Inference* is implemented as a Kubernetes *Deployment* object with a defined number of replicas. The replicas are load-balanced by Kubernetes so adding more replicas will improve the overall throughput of the system.
30
30
31
-
* Multiple replicas will appear in Run:ai as a single Inference workload. The workload will appear in all Run:ai dashboards and views as well as the Command-line interface.
31
+
* Multiple replicas will appear in Run:ai as a single *Inference* workload. The workload will appear in all Run:ai dashboards and views as well as the Command-line interface.
32
32
33
33
* Inference workloads can be submitted via Run:ai[user interface](../admin-ui-setup/deployments.md) as well as [Run:ai API](../../developer/cluster-api/workload-overview-dev.md). Internally, spawning an Inference workload also creates a Kubernetes *Service*. The service is an end-point to which clients can connect.
34
34
35
35
## Autoscaling
36
36
37
-
To withstand SLA, Inference workloads are typically set with *autoscaling*. Autoscaling is the ability to add more computing power (Kubernetes pods) when the load increases and shrink allocated resources when the system is idle.
37
+
To withstand SLA, *Inference* workloads are typically set with *auto scaling*. Auto-scaling is the ability to add more computing power (Kubernetes pods) when the load increases and shrink allocated resources when the system is idle.
38
38
39
39
There are a number of ways to trigger autoscaling. Run:ai supports the following:
40
40
@@ -46,11 +46,12 @@ There are a number of ways to trigger autoscaling. Run:ai supports the following
46
46
The Minimum and Maximum number of replicas can be configured as part of the autoscaling configuration.
47
47
48
48
Autoscaling also supports a scale to zero policy with *Throughput* and *Concurrency* metrics, meaning that given enough time under the target threshold, the number of replicas will be scaled down to 0.
49
+
49
50
This has the benefit of conserving resources at the risk of a delay from "cold starting" the model when traffic resumes.
50
51
51
52
## See Also
52
53
53
-
* To set up Inference, see [Cluster installation prerequisites](../runai-setup/cluster-setup/cluster-prerequisites.md#inference).
54
-
* For running Inference see [Inference quick-start](../../Researcher/Walkthroughs/quickstart-inference.md).
55
-
* To run Inference from the user interface see [Deployments](../admin-ui-setup/deployments.md).
56
-
* To run Inference using API see [Workload overview](../../developer/cluster-api/workload-overview-dev.md).
54
+
* To set up *Inference*, see [Cluster installation prerequisites](../runai-setup/cluster-setup/cluster-prerequisites.md#inference).
55
+
* For running *Inference* see [Inference quick-start](../../Researcher/Walkthroughs/quickstart-inference.md).
56
+
* To run *Inference* from the user interface see [Deployments](../admin-ui-setup/deployments.md).
57
+
* To run *Inference* using API see [Workload overview](../../developer/cluster-api/workload-overview-dev.md).
0 commit comments