Skip to content

Commit 98075b8

Browse files
authored
Merge pull request #80197 from ochromy/SRVKS-1115
[SRVKS-1115] Performance and scalability of OpenShift Serverless Serving
2 parents f8ace5b + 436cc54 commit 98075b8

9 files changed

+326
-3
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,8 @@ Topics:
108108
Topics:
109109
- Name: Creating OpenShift Serverless applications
110110
File: serverless-applications
111+
- Name: Scalability and Performance
112+
File: scalability-and-performance-serving
111113
- Name: Autoscaling
112114
Dir: autoscaling
113115
Topics:

install/preparing-serverless-install.adoc

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,17 @@ The set of supported features, configurations, and integrations for {ServerlessP
2727
[id="about-serverless-scalability-performance"]
2828
== Scalability and performance on {ocp-product-title}
2929

30-
{ServerlessProductName} has been tested with a configuration of 3 main nodes and 3 worker nodes, each of which has 64 CPUs, 457 GB of memory, and 394 GB of storage each.
30+
With a configuration of 3 main nodes and 3 worker nodes, each of which has 64 CPUs, 457 GB of memory, and 394 GB of storage each, the following time values have been determined during testing for a simple Quarkus application:
3131

32-
The maximum number of Knative services that can be created using this configuration is 3,000. This corresponds to the link:https://docs.openshift.com/container-platform/latest/scalability_and_performance/planning-your-environment-according-to-object-maximums.html#cluster-maximums-major-releases_object-limits[{ocp-product-title} Kubernetes services limit of 10,000], since 1 Knative service creates 3 Kubernetes services.
32+
* The average scale-from-zero response time was approximately 3.4 seconds.
33+
* The maximum response time was 8 seconds.
34+
* The 99.9th percentile of response times was 4.5 seconds.
3335

34-
The average scale from zero response time was approximately 3.4 seconds, with a maximum response time of 8 seconds, and a 99.9th percentile of 4.5 seconds for a simple Quarkus application. These times might vary depending on the application and the runtime of the application.
36+
These times might vary depending on the application and the runtime of the application.
3537

38+
The maximum number of Knative services that can be created is 3,000. This corresponds to the link:https://docs.openshift.com/container-platform/latest/scalability_and_performance/planning-your-environment-according-to-object-maximums.html#cluster-maximums-major-releases_object-limits[{ocp-product-title} Kubernetes services limit of 10,000], since 1 Knative service creates 3 Kubernetes services.
39+
40+
Learn more about scaling and performance of {ServerlessProductName} Serving in xref:../knative-serving/scalability-and-performance-serving.adoc#scalability-and-performance-serving[Scalability and performance of {ServerlessProductName} Serving].
3641

3742
// OCP specific docs
3843

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
include::_attributes/common-attributes.adoc[]
3+
[id="scalability-and-performance-serving"]
4+
= Scalability and performance of {ServerlessProductName} Serving
5+
:context: scalability-and-performance-serving
6+
7+
toc::[]
8+
9+
{ServerlessProductName} consists of several different components that have different resource requirements and scaling behaviors. These components are horizontally and vertically scalable, but their resource requirements and configuration highly depend on the actual use-case.
10+
11+
Control-plane components:: These components are responsible for observing and reacting to custom resources and continuously reconfiguring the system, for example, the controller pods.
12+
13+
Data-plane components:: These components are directly involved in requests and response handling, for example, the Knative Servings activator component.
14+
15+
The following metrics and findings were recorded using the following test setup:
16+
17+
* A cluster running {ocp-product-title} 4.13
18+
19+
* The cluster running 4 compute nodes in AWS with a machine type of m6.xlarge
20+
21+
* {ServerlessProductName} 1.30
22+
23+
include::modules/serverless-overhead-serving.adoc[leveloffset=+1]
24+
25+
include::modules/serverless-known-limitations-serving.adoc[leveloffset=+1]
26+
27+
include::modules/serverless-scaling-serving.adoc[leveloffset=+1]
28+
29+
include::modules/serverless-minimal-requirements-serving.adoc[leveloffset=+2]
30+
31+
include::modules/serverless-config-minimal-workloads-serving.adoc[leveloffset=+2]
32+
33+
include::modules/serverless-config-high-workloads-serving.adoc[leveloffset=+2]
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * /knative-serving/scalability-and-performance-serving.adoc
4+
5+
6+
:_mod-docs-content-type: PROCEDURE
7+
[id="serverless-config-high-workloads-serving_{context}"]
8+
= Configuring Serving for high workloads
9+
10+
You can configure Knative Serving for high workloads using the `KnativeServing` custom resource (CR).
11+
The following findings are relevant to configuring Knative Serving for a high workload:
12+
13+
[NOTE]
14+
====
15+
These findings have been tested with requests with a payload size of 0-32 kb. The Knative Service backends used in those tests had a startup latency between 0 to 10 seconds and response times between 0 to 5 seconds.
16+
====
17+
18+
* All data-plane components are mostly increasing CPU usage on higher requests and payload scenarios, so the CPU requests and limits have to be tested and potentially increased.
19+
* The activator component also might need more memory, when it has to buffer more or bigger request payloads, so the memory requests and limits might need to be increased as well.
20+
* One activator pod can handle approximately 2500 requests per second before it starts to increase latency and, at some point, leads to errors.
21+
* One `3scale-kourier-gateway` or `istio-ingressgateway` pod can also handle approximately 2500 requests per second before it starts to increase latency and, at some point, leads to errors.
22+
* Each of the data-plane components consumes up to 1 vCPU of CPU for handling 2500 requests per second. Note that this highly depends on the payload size and the response times of the Knative Service backend.
23+
24+
[IMPORTANT]
25+
====
26+
Fast startup and fast response-times of your Knative Service user workloads are critical for good performance of the overall system. The Knative Serving components are buffering incoming requests when the Knative Service user backend is scaling up or when request concurrency has reached its capacity. If your Knative Service user workload introduces long startup or request latency, it will either overload the `activator` component (when the CPU and memory configuration is too low) or lead to errors for the calling clients.
27+
====
28+
29+
.Procedure
30+
31+
* To fine-tune your installation, use the previous findings combined with your own test results to configure the `KnativeServing` custom resource:
32+
+
33+
.A high workload configuration in KnativeServing CR
34+
[source,yaml]
35+
----
36+
apiVersion: operator.knative.dev/v1beta1
37+
kind: KnativeServing
38+
metadata:
39+
name: knative-serving
40+
namespace: knative-serving
41+
spec:
42+
high-availability:
43+
replicas: 2 <1>
44+
workloads:
45+
- name: component-name <2>
46+
replicas: 2 <3>
47+
resources:
48+
- container: container-name
49+
requests:
50+
cpu: <4>
51+
memory:
52+
limits:
53+
cpu:
54+
memory:
55+
podDisruptionBudgets: <5>
56+
- name: name-of-pod-disruption-budget
57+
minAvailable: 1
58+
----
59+
<1> Set this parameter to at least `2` to make sure you always have at least two instances of every component running. You can also use `workloads` to override the replicas for certain components.
60+
<2> Use the `workloads` list to configure specific components. Use the `deployment` name of the component and set the `replicas` field.
61+
<3> For the `activator`, `webhook`, and `3scale-kourier-gateway` components, which use horizontal pod autoscalers (HPAs), the `replicas` field sets the minimum number of replicas. The actual number of replicas depends on the CPU load and scaling done by the HPAs.
62+
<4> Set the requested and limited CPU and memory according to at least the idle consumption while also taking the previous findings and your own test results into consideration.
63+
<5> Adjust the `PodDistruptionBudgets` to a value lower than `replicas` to avoid problems during node maintenance. The default `minAvailable` is set to `1`, so if you increase the required replicas, you must also increase `minAvailable`.
64+
65+
[IMPORTANT]
66+
====
67+
As each environment is highly specific, it is essential to test and find your own ideal configuration.
68+
Use the monitoring and alerting functionality of {ocp-product-title} to continuously monitor your actual resource consumption and make adjustments if needed.
69+
70+
If you are using the {ServerlessProductName} and {SMProductShortName} integration, additional CPU processing is added by the `istio-proxy` sidecar containers.
71+
For more information about this, see the {SMProductShortName} documentation.
72+
====
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * /knative-serving/scalability-and-performance-serving.adoc
4+
5+
6+
:_mod-docs-content-type: PROCEDURE
7+
[id="serverless-config-minimal-workloads-serving_{context}"]
8+
= Configuring Serving for minimal workloads
9+
10+
.Procedure
11+
12+
* You can configure Knative Serving for minimal workloads using the `KnativeServing` custom resource (CR):
13+
+
14+
.A minimal workload configuration in KnativeServing CR
15+
[source,yaml]
16+
----
17+
apiVersion: operator.knative.dev/v1beta1
18+
kind: KnativeServing
19+
metadata:
20+
name: knative-serving
21+
namespace: knative-serving
22+
spec:
23+
high-availability:
24+
replicas: 1 <1>
25+
workloads:
26+
- name: activator
27+
replicas: 2 <2>
28+
resources:
29+
- container: activator
30+
requests:
31+
cpu: 250m <3>
32+
memory: 60Mi <4>
33+
limits:
34+
cpu: 1000m
35+
memory: 600Mi
36+
- name: controller
37+
replicas: 1 <5>
38+
resources:
39+
- container: controller
40+
requests:
41+
cpu: 10m
42+
memory: 100Mi
43+
limits: <6>
44+
cpu: 200m
45+
memory: 300Mi
46+
- name: webhook
47+
replicas: 2
48+
resources:
49+
- container: webhook
50+
requests:
51+
cpu: 100m <7>
52+
memory: 60Mi
53+
limits:
54+
cpu: 200m
55+
memory: 200Mi
56+
podDisruptionBudgets: <8>
57+
- name: activator-pdb
58+
minAvailable: 1
59+
- name: webhook-pdb
60+
minAvailable: 1
61+
----
62+
<1> Setting this to `1` scales all system components to one replica.
63+
<2> Activator should always be scaled to a minimum of `2` instances to avoid downtime.
64+
<3> Activator CPU requests should not be set lower than `250m`, as a `HorizontalPodAutoscaler` will use this as a reference to scale up and down.
65+
<4> Adjust memory requests to the idle values from the previous table. Also adjust memory limits according to your expected load (this might need custom testing to find the best values).
66+
<5> One webhook and one controller are sufficient for a minimal-workload scenario
67+
<6> These limits are sufficient for a minimal-workload scenario, but they also might need adjustments depending on your concrete workload.
68+
<7> Webhook CPU requests should not be set lower than `100m`, as a HorizontalPodAutoscaler will use this as a reference to scale up and down.
69+
<8> Adjust the `PodDistruptionBudgets` to a value lower than `replicas`, to avoid problems during node maintenance.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * /knative-serving/scalability-and-performance-serving.adoc
4+
5+
6+
:_mod-docs-content-type: CONCEPT
7+
[id="serverless-known-limitations-serving_{context}"]
8+
= Known limitations of {ServerlessProductName} Serving
9+
10+
The maximum number of Knative Services that can be created is 3,000. This corresponds to the {ocp-product-title} Kubernetes services limit of 10,000, since 1 Knative Service creates 3 Kubernetes services.
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * /knative-serving/scalability-and-performance-serving.adoc
4+
5+
6+
:_mod-docs-content-type: CONCEPT
7+
[id="serverless-minimal-requirements-serving_{context}"]
8+
= Minimal requirements of {ServerlessProductName} Serving
9+
10+
While the default setup is suitable for medium-sized workloads, it might be over-sized for smaller setups or under-sized for high-workload scenarios.
11+
To configure {ServerlessProductName} Serving for a minimal workload scenario, you need to know the idle consumption of the system components.
12+
13+
[id="serverless-minimal-requirements-serving-idle-consumption_{context}"]
14+
== Idle consumption
15+
16+
The idle consumption is dependent on the number of Knative Services. The following memory usage has been measured for the components in the `knative-serving` and `knative-serving-ingress` {ocp-product-title} projects:
17+
18+
[cols=5*,options="header"]
19+
|===
20+
|Component
21+
|0 Services
22+
|100 Services
23+
|500 Services
24+
|1000 Services
25+
26+
|`activator`
27+
|55Mi
28+
|86Mi
29+
|300Mi
30+
|450Mi
31+
32+
|`autoscaler`
33+
|52Mi
34+
|102Mi
35+
|225Mi
36+
|350Mi
37+
38+
|`controller`
39+
|100Mi
40+
|135Mi
41+
|310Mi
42+
|500Mi
43+
44+
|`webhook`
45+
|60Mi
46+
|60Mi
47+
|60Mi
48+
|60Mi
49+
50+
|`3scale-kourier-gateway`
51+
|20Mi
52+
|60Mi
53+
|190Mi
54+
|330Mi
55+
56+
|`net-kourier-controller`
57+
|90Mi
58+
|170Mi
59+
|340Mi
60+
|430Mi
61+
62+
|===
63+
64+
[NOTE]
65+
====
66+
Either `3scale-kourier-gateway` and `net-kourier-controller` components or `istio-ingressgateway` and `net-istio-controller` components are installed.
67+
68+
The memory consumption of `net-istio` is based on the total number of pods within the mesh.
69+
====
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * /knative-serving/scalability-and-performance-serving.adoc
4+
5+
6+
:_mod-docs-content-type: CONTEXT
7+
[id="serverless-overhead-serving_{context}"]
8+
= Overhead of {ServerlessProductName} Serving
9+
10+
As components of {ServerlessProductName} Serving are part of the data-plane, requests from clients are routed through:
11+
12+
* The ingress-gateway (Kourier or {SMProductShortName})
13+
* The activator component
14+
* The queue-proxy sidecar container in each Knative Service
15+
16+
These components introduce an additional hop in networking and perform additional tasks, for example, adding observability and request queuing. The following are the measured latency overheads:
17+
18+
* Each additional network hop adds 0.5 ms to 1 ms latency to a request. Depending on the current load of the Knative Service and if the Knative Service was scaled to zero before the request, the activator component is not always a part of the data-plane.
19+
* Depending on the payload size, each of the components is consuming up to 1 vCPU of CPU for handling 2500 requests per second.
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * /knative-serving/scalability-and-performance-serving.adoc
4+
5+
6+
:_mod-docs-content-type: CONCEPT
7+
[id="serverless-scaling-serving_{context}"]
8+
= Scaling and performance of {ServerlessProductName} Serving
9+
10+
{ServerlessProductName} Serving has to be scaled and configured based on the following parameters:
11+
12+
* Number of Knative Services
13+
* Number of Revisions
14+
* Amount of concurrent requests in the system
15+
* Size of payloads of the requests
16+
* The startup-latency and response latency of the Knative Service added by the user's web application
17+
* Number of changes of the KnativeService custom resource (CR) over time
18+
19+
[id="serverless-scaling-serving-defaults_{context}"]
20+
== KnativeServing default configuration
21+
22+
Per default, {ServerlessProductName} Serving is configured to run all components with high-availability and medium-sized CPU and memory requests and limits. This means that the `high-available` field in `KnativeServing` CR is automatically set to a value of `2` and all system components are scaled to two replicas. This configuration is suitable for medium workload scenarios and has been tested with:
23+
24+
* 170 Knative Services
25+
* 1-2 Revisions per Knative Service
26+
* 89 test scenarios mainly focused on testing the control plane
27+
* 48 re-creating scenarios where Knative Services are deleted and re-created
28+
* 41 stable scenarios, in which requests are slowly but continuously sent to the system
29+
30+
During these test cases, the system components effectively consumed:
31+
32+
[cols=2*,options="header"]
33+
|===
34+
35+
|Component
36+
|Measured Resources
37+
38+
|Operator in project `openshift-serverless`
39+
|1 GB Memory, 0.2 Cores of CPU
40+
41+
|Serving components in project `knative-serving`
42+
|5 GB Memory, 2.5 Cores of CPU
43+
44+
|===

0 commit comments

Comments
 (0)