Skip to content

Commit c1df89a

Browse files
authored
Merge pull request #88632 from skrthomas/OSDOCS-11379
OSDOCS-11379: 1.8 resource table and concept updates
2 parents a516e60 + b1b728d commit c1df89a

File tree

4 files changed

+33
-41
lines changed

4 files changed

+33
-41
lines changed

modules/network-observability-resource-recommendations.adoc

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,14 @@ The following settings can help you manage resources and performance from the ou
1111

1212
eBPF Sampling:: You can set the Sampling specification, `spec.agent.ebpf.sampling`, to manage resources. Smaller sampling values might consume a large amount of computational, memory and storage resources. You can mitigate this by specifying a sampling ratio value. A value of `100` means 1 flow every 100 is sampled. A value of `0` or `1` means all flows are captured. Smaller values result in an increase in returned flows and the accuracy of derived metrics. By default, eBPF sampling is set to a value of 50, so 1 flow every 50 is sampled. Note that more sampled flows also means more storage needed. Consider starting with the default values and refine empirically, in order to determine which setting your cluster can manage.
1313

14+
eBPF features:: The more features that are enabled, the more CPU and memory are impacted. See "Observing the network traffic" for a complete list of these features.
15+
16+
Without Loki:: You can reduce the amount of resources that Network Observability requires by not using Loki and instead relying on Prometheus. For example, when Network Observability is configured without Loki, the total savings of memory usage are in the 20-65% range and CPU utilization is lower by 10-30%, depending upon the sampling value. See "Network Observability without Loki" for more information.
17+
1418
Restricting or excluding interfaces:: Reduce the overall observed traffic by setting the values for `spec.agent.ebpf.interfaces` and `spec.agent.ebpf.excludeInterfaces`. By default, the agent fetches all the interfaces in the system, except the ones listed in `excludeInterfaces` and `lo` (local interface). Note that the interface names might vary according to the Container Network Interface (CNI) used.
1519

16-
The following settings can be used to fine-tune performance after the Network Observability has been running for a while:
20+
Performance fine-tuning:: The following settings can be used to fine-tune performance after the Network Observability has been running for a while:
1721

18-
Resource requirements and limits:: Adapt the resource requirements and limits to the load and memory usage you expect on your cluster by using the `spec.agent.ebpf.resources` and `spec.processor.resources` specifications. The default limits of 800MB might be sufficient for most medium-sized clusters.
22+
* *Resource requirements and limits*: Adapt the resource requirements and limits to the load and memory usage you expect on your cluster by using the `spec.agent.ebpf.resources` and `spec.processor.resources` specifications. The default limits of 800MB might be sufficient for most medium-sized clusters.
1923
20-
Cache max flows timeout:: Control how often flows are reported by the agents by using the eBPF agent's `spec.agent.ebpf.cacheMaxFlows` and `spec.agent.ebpf.cacheActiveTimeout` specifications. A larger value results in less traffic being generated by the agents, which correlates with a lower CPU load. However, a larger value leads to a slightly higher memory consumption, and might generate more latency in the flow collection.
24+
* *Cache max flows timeout*: Control how often flows are reported by the agents by using the eBPF agent's `spec.agent.ebpf.cacheMaxFlows` and `spec.agent.ebpf.cacheActiveTimeout` specifications. A larger value results in less traffic being generated by the agents, which correlates with a lower CPU load. However, a larger value leads to a slightly higher memory consumption, and might generate more latency in the flow collection.

modules/network-observability-resources-table.adoc

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,17 @@ The examples outlined in the table demonstrate scenarios that are tailored to sp
1414
.Resource recommendations
1515
[options="header"]
1616
|===
17-
| | Extra small (10 nodes) | Small (25 nodes) | Medium (65 nodes) ^[2]^ | Large (120 nodes) ^[2]^
18-
| *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem ^[1]^ |16 vCPUs\| 64GiB Mem ^[1]^
19-
| *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.small` | `1x.medium`
20-
| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) | 400Mi (default)
21-
| *eBPF sampling rate* | 50 (default) | 50 (default) | 50 (default) | 50 (default)
22-
| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 1600Mi
23-
| *cacheMaxSize* | 50,000 | 100,000 (default) | 100,000 (default) | 100,000 (default)
24-
| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 800Mi (default)
25-
| *FLP Kafka partitions* | N/A | 48 | 48 | 48
26-
| *Kafka consumer replicas* | N/A | 6 | 12 | 18
27-
| *Kafka brokers* | N/A | 3 (default) | 3 (default) | 3 (default)
17+
| | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) ^[2]^
18+
| *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem ^[1]^ |16 vCPUs\| 64GiB Mem ^[1]^
19+
| *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.medium`
20+
| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default)
21+
| *eBPF sampling rate* | 50 (default) | 50 (default) | 50 (default)
22+
| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 1600Mi
23+
| *cacheMaxSize* | 50,000 | 100,000 (default) | 100,000 (default)
24+
| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default)
25+
| *FLP Kafka partitions* | | 48 | 48
26+
| *Kafka consumer replicas* | | 6 | 18
27+
| *Kafka brokers* | | 3 (default) | 3 (default)
2828
|===
2929
[.small]
3030
--

modules/network-observability-total-resource-usage.adoc

Lines changed: 12 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,12 @@
55
[id="network-observability-total-resource-usage-table_{context}"]
66
= Total average memory and CPU usage
77

8-
The following table outlines averages of total resource usage for clusters with a sampling value of 1, 50, and 400 for 3 different tests: `Test 1`, `Test 2`, and `Test 3`. The tests differ in the following ways:
8+
The following table outlines averages of total resource usage for clusters with a sampling value of `1` and `50` for two different tests: `Test 1` and `Test 2`. The tests differ in the following ways:
99

10-
- `Test 1` takes into account the total number of namespace, pods and services in an {product-title} cluster, places load on the eBPF agent, and represents use cases with a high number of workloads for a given cluster size. For example, `Test 1` consists of 76 Namespaces, 5153 Pods, and 2305 Services.
11-
- `Test 2` takes into account a high ingress traffic volume.
12-
- `Test 3` takes into account the total number of namespace, pods and services in an {product-title} cluster, places load on the eBPF agent on a larger scale than `Test 1`, and represents use cases with a high number of workloads for a given cluster size. For example, `Test 3` consists of 553 Namespaces, 6998 Pods, and 2508 Services.
10+
- `Test 1` takes into account high ingress traffic volume in addition to the total number of namespace, pods and services in an {product-title} cluster, places load on the eBPF agent, and represents use cases with a high number of workloads for a given cluster size. For example, `Test 1` consists of 76 Namespaces, 5153 Pods, and 2305 Services with a network traffic scale of ~350 MB/s.
11+
- `Test 2` takes into account high ingress traffic volume in addition to the total number of namespace, pods and services in an {product-title} cluster and represents use cases with a high number of workloads for a given cluster size. For example, `Test 2` consists of 553 Namespaces, 6998 Pods, and 2508 Services with a network traffic scale of ~950 MB/s.
1312
14-
Since different types of cluster use cases are exemplified in the different tests, the numbers in this table cannot be linearly compared side-by-side, but instead are intended to be used as a benchmark for evaluating your personal cluster usage. The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
13+
Since different types of cluster use cases are exemplified in the different tests, the numbers in this table do not scale linearly when compared side-by-side. Instead, they are intended to be used as a benchmark for evaluating your personal cluster usage. The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
1514

1615
[NOTE]
1716
====
@@ -21,26 +20,13 @@ Metrics exported to Prometheus can impact the resource usage. Cardinality values
2120
.Total average resource usage
2221
[%autowidth, options="header"]
2322
|===
24-
| Sampling value | Parameters | Test 1 (25 nodes) | Test 2 (65 nodes) | Test 3 (120 nodes)
25-
.6+| *Sampling = 1* | *With Loki* 3+|
26-
| Total NetObserv CPU Usage | 3.24 | 3.42 | 7.30
27-
| Total NetObserv RSS (Memory) Usage | 14.09 GB | 22.56 GB | 36.46 GB
28-
| *Without Loki* 3+|
29-
| Total NetObserv CPU Usage | 2.40 | 2.43 | 5.59
30-
| Total NetObserv RSS (Memory) Usage | 6.85 GB | 10.39 GB | 13.92 GB
31-
.6+| *Sampling = 50* | *With Loki* 3+|
32-
| Total NetObserv CPU Usage | 2.04 | 2.36 | 3.31
33-
| Total NetObserv RSS (Memory) Usage | 8.79 GB | 19.14 GB | 21.07 GB
34-
| *Without Loki* 3+|
35-
| Total NetObserv CPU Usage | 1.55 | 1.64 | 2.70
36-
| Total NetObserv RSS (Memory) Usage | 6.71 GB | 10.15 GB | 14.82 GB
37-
.6+| *Sampling = 400* | *With Loki* 3+|
38-
| Total NetObserv CPU Usage | 1.71 | 1.44 | 2.36
39-
| Total NetObserv RSS (Memory) Usage | 8.21 GB | 16.02 GB | 17.44 GB
40-
| *Without Loki* 3+|
41-
| Total NetObserv CPU Usage | 1.31 | 1.06 | 1.83
42-
| Total NetObserv RSS (Memory) Usage | 7.01 GB | 10.70 GB | 13.26 GB
23+
| Sampling value | Resources used | Test 1 (25 nodes) | Test 2 (250 nodes)
24+
.2+| *Sampling = 50*
25+
| Total NetObserv CPU Usage | 1.35 | 5.39
26+
| Total NetObserv RSS (Memory) Usage | 16 GB | 63 GB
27+
.2+| *Sampling = 1*
28+
| Total NetObserv CPU Usage | 1.82 | 11.99
29+
| Total NetObserv RSS (Memory) Usage | 22 GB | 87 GB
4330
|===
4431

45-
46-
Summary: This table shows average total resource usage of Network Observability (Agents+FLP+Kafka+Loki).
32+
Summary: This table shows average total resource usage of Network Observability, which includes Agents, FLP, Kafka, and Loki with all features enabled. For details about what features are enabled, see the features covered in "Observing the network traffic", which comprises all the features that are enabled for this testing.

observability/network_observability/configuring-operator.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,6 @@ include::modules/network-observability-total-resource-usage.adoc[leveloffset=+2]
3131

3232
[role="_additional-resources"]
3333
.Additional resources
34-
* xref:../../observability/network_observability/json-flows-format-reference.adoc#network-observability-flows-format_json_reference[Network Flows format reference].
34+
* xref:../network_observability/observing-network-traffic.adoc#network-observability-trafficflow_nw-observe-network-traffic[Observing the network traffic from the traffic flows view]
35+
* xref:../network_observability/installing-operators.adoc#network-observability-without-loki_network_observability[Network Observability without Loki]
36+
* xref:../network_observability/json-flows-format-reference.adoc#network-observability-flows-format_json_reference[Network Flows format reference]

0 commit comments

Comments
 (0)