Skip to content

Commit fc302c0

Browse files
authored
Merge pull request #86741 from maxwelldb/shiftstack-observability
[OSDOCS#12841] Observability metrics correlation for ShiftStack
2 parents ebe2f38 + ee24fb0 commit fc302c0

File tree

6 files changed

+325
-0
lines changed

6 files changed

+325
-0
lines changed

_attributes/common-attributes.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,8 @@ ifdef::openshift-origin[]
316316
:rh-openstack-first: OpenStack
317317
:rh-openstack: OpenStack
318318
endif::openshift-origin[]
319+
:rhoso-first: Red Hat OpenStack Services on OpenShift (RHOSO)
320+
:rhoso: RHOSO
319321
// VMware vSphere
320322
:vmw-first: VMware vSphere
321323
:vmw-full: VMware vSphere

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2917,6 +2917,8 @@ Topics:
29172917
File: managing-alerts
29182918
- Name: Reviewing monitoring dashboards
29192919
File: reviewing-monitoring-dashboards
2920+
- Name: Monitoring clusters that run on RHOSO
2921+
File: shiftstack-prometheus-configuration
29202922
- Name: Accessing monitoring APIs by using the CLI
29212923
File: accessing-third-party-monitoring-apis
29222924
- Name: Troubleshooting monitoring issues
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * observability/monitoring/shiftstack-prometheus-configuration.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="monitoring-configuring-shiftstack-remotewrite_{context}"]
7+
= Remote writing to an external Prometheus instance
8+
9+
Use remote write with both {rhoso-first} and {product-title} to push their metrics to an external Prometheus instance.
10+
11+
.Prerequisites
12+
13+
- You have access to an external Prometheus instance.
14+
- You have administrative access to {rhoso} and your cluster.
15+
- You have certificates for secure communication with mTLS.
16+
- Your Prometheus instance is configured for client TLS certificates and has been set up as a remote write receiver.
17+
- The Cluster Observability Operator is installed on your {rhoso} cluster.
18+
- The monitoring stack for your {rhoso} cluster is configured to collect the metrics that you are interested in.
19+
- Telemetry is enabled in the {rhoso} environment.
20+
+
21+
[NOTE]
22+
====
23+
To verify that the telemetry service is operating normally, entering the following command:
24+
[source,shell]
25+
----
26+
$ oc -n openstack get monitoringstacks metric-storage -o yaml
27+
----
28+
The `monitoringstacks` CRD indicates whether telemetry is enabled correctly.
29+
====
30+
31+
.Procedure
32+
33+
// Steps 1, 2, 3, and 4 run on the OpenShift cluster hosting the RHOSO control plane. This configure RHOSO to send their metrics to an external prometheus.
34+
//
35+
// Steps 5, 6, 7, and 8 run on the tenant's OpenShift cluster. This configures the tenant OpenShift cluster to send their metrics to the same Prometheus instance.
36+
// Comment from before moving telemetry check to prereqs -- offset by 1.
37+
38+
// on mgmt cluster
39+
40+
. Configure your {rhoso} management cluster to send metrics to Prometheus:
41+
42+
.. Create a secret that is named `mtls-bundle` in the `openstack` namespace that contains HTTPS client certificates for authentication to Prometheus by entering the following command:
43+
+
44+
[source,shell]
45+
----
46+
$ oc --namespace openstack \
47+
create secret generic mtls-bundle \
48+
--from-file=./ca.crt \
49+
--from-file=osp-client.crt \
50+
--from-file=osp-client.key
51+
----
52+
53+
.. Open the `controlplane` configuration for editing by running the following command:
54+
+
55+
[source,shell]
56+
----
57+
$ oc -n openstack edit openstackcontrolplane/controlplane
58+
----
59+
60+
.. With the configuration open, replace the `.spec.telemetry.template.metricStorage` section so that {rhoso} sends metrics to Prometheus. As an example:
61+
+
62+
[source,yaml]
63+
----
64+
metricStorage:
65+
customMonitoringStack:
66+
alertmanagerConfig:
67+
disabled: false
68+
logLevel: info
69+
prometheusConfig:
70+
scrapeInterval: 30s
71+
remoteWrite:
72+
- url: https://external-prometheus.example.com/api/v1/write # <1>
73+
tlsConfig:
74+
ca:
75+
secret:
76+
name: mtls-bundle
77+
key: ca.crt
78+
cert:
79+
secret:
80+
name: mtls-bundle
81+
key: ocp-client.crt
82+
keySecret:
83+
name: mtls-bundle
84+
key: ocp-client.key
85+
replicas: 2
86+
resourceSelector:
87+
matchLabels:
88+
service: metricStorage
89+
resources:
90+
limits:
91+
cpu: 500m
92+
memory: 512Mi
93+
requests:
94+
cpu: 100m
95+
memory: 256Mi
96+
retention: 1d # <2>
97+
dashboardsEnabled: false
98+
dataplaneNetwork: ctlplane
99+
enabled: true
100+
prometheusTls: {}
101+
----
102+
<1> Replace this URL with the URL of your Prometheus instance.
103+
<2> Set a retention period. Optionally, you can reduce retention for local metrics because of external collection.
104+
// run on tenant's openshift cluster
105+
. Configure the tenant cluster on which your workloads run to send metrics to Prometheus:
106+
107+
.. Create a cluster monitoring config map as a YAML file. The map must include a remote write configuration and cluster identifiers. As an example:
108+
+
109+
[source,yaml]
110+
----
111+
apiVersion: v1
112+
kind: ConfigMap
113+
metadata:
114+
name: cluster-monitoring-config
115+
namespace: openshift-monitoring
116+
data:
117+
config.yaml: |
118+
prometheusK8s:
119+
retention: 1d # <1>
120+
remoteWrite:
121+
- url: "https://external-prometheus.example.com/api/v1/write"
122+
writeRelabelConfigs:
123+
- sourceLabels:
124+
- __tmp_openshift_cluster_id__
125+
targetLabel: cluster_id
126+
action: replace
127+
tlsConfig:
128+
ca:
129+
secret:
130+
name: mtls-bundle
131+
key: ca.crt
132+
cert:
133+
secret:
134+
name: mtls-bundle
135+
key: ocp-client.crt
136+
keySecret:
137+
name: mtls-bundle
138+
key: ocp-client.key
139+
----
140+
<1> Set a retention period. Optionally, you can reduce retention for local metrics because of external collection.
141+
142+
.. Save the config map as a file called `cluster-monitoring-config.yaml`.
143+
144+
.. Create a secret that is named `mtls-bundle` in the `openshift-monitoring` namespace that contains HTTPS client certificates for authentication to Prometheus by entering the following command:
145+
+
146+
[source,terminal]
147+
----
148+
$ oc --namespace openshift-monitoring \
149+
create secret generic mtls-bundle \
150+
--from-file=./ca.crt \
151+
--from-file=ocp-client.crt \
152+
--from-file=ocp-client.key
153+
----
154+
155+
.. Apply the cluster monitoring configuration by running the following command:
156+
+
157+
[source,terminal]
158+
----
159+
$ oc apply -f cluster-monitoring-config.yaml
160+
----
161+
162+
After the changes propagate, you can see aggregated metrics in your external Prometheus instance.
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * observability/monitoring/shiftstack-prometheus-configuration.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="monitoring-configuring-shiftstack-scraping_{context}"]
7+
= Collecting cluster metrics from the federation endpoint
8+
9+
You can employ the federation endpoint of your {product-title} cluster to make metrics available to a {rhoso-first} cluster to practice pull-based monitoring.
10+
11+
.Prerequisites
12+
13+
- You have administrative access to {rhoso} and the tenant cluster that is running on it.
14+
- Telemetry is enabled in the {rhoso} environment.
15+
- The Cluster Observability Operator is installed on your cluster.
16+
- The monitoring stack for your cluster is configured.
17+
- Your cluster has its federation endpoint exposed.
18+
19+
.Procedure
20+
21+
. Connect to your cluster by using a username and password; do not log in by using a `kubeconfig` file that was generated by the installation program.
22+
23+
. To retrieve a token from the {product-title} cluster, run the following command on it:
24+
+
25+
[source,terminal]
26+
----
27+
$ oc whoami -t
28+
----
29+
30+
. Make the token available as a secret in the `openstack` namespace in the {rhoso} management cluster by running the following command:
31+
+
32+
[source,terminal]
33+
----
34+
$ oc -n openstack create secret generic ocp-federated --from-literal=token=<the_token_fetched_previously>
35+
----
36+
37+
. To get the Prometheus federation route URL from your {product-title} cluster, run the following command:
38+
+
39+
[source,terminal]
40+
----
41+
$ oc -n openshift-monitoring get route prometheus-k8s-federate -ojsonpath={'.status.ingress[].host'}
42+
----
43+
44+
. Write a manifest for a scrape configuration and save it as a file called `cluster-scrape-config.yaml`. As an example:
45+
+
46+
[source,yaml]
47+
----
48+
apiVersion: monitoring.rhobs/v1alpha1
49+
kind: ScrapeConfig
50+
metadata:
51+
labels:
52+
service: metricStorage
53+
name: sos1-federated
54+
namespace: openstack
55+
spec:
56+
params:
57+
'match[]':
58+
- '{__name__=~"kube_node_info|kube_persistentvolume_info|cluster:master_nodes"}' # <1>
59+
metricsPath: '/federate'
60+
authorization:
61+
type: Bearer
62+
credentials:
63+
name: ocp-federated # <2>
64+
key: token
65+
scheme: HTTPS # or HTTP
66+
scrapeInterval: 30s # <3>
67+
staticConfigs:
68+
- targets:
69+
- prometheus-k8s-federate-openshift-monitoring.apps.openshift.example # <4>
70+
----
71+
<1> Add metrics here. In this example, only the metrics `kube_node_info`, `kube_persistentvolume_info`, and `cluster:master_nodes` are requested.
72+
<2> Insert the previously generated secret name here.
73+
<3> Limit scraping to fewer than 1000 samples for each request with a maximum frequency of once every 30 seconds.
74+
<4> Insert the URL you fetched previously here. If the endpoint is HTTPS and uses a custom certificate authority, add a `tlsConfig` section after it.
75+
76+
. While connected to the {rhoso} management cluster, apply the manifest by running the following command:
77+
+
78+
[source,terminal]
79+
----
80+
$ oc apply -f cluster-scrape-config.yaml
81+
----
82+
83+
After the config propagates, the cluster metrics are accessible for querying in the {product-title} UI in RHOSO.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * observability/monitoring/shiftstack-prometheus-configuration.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="monitoring-shiftstack-metrics.adoc_{context}"]
7+
= Available metrics for clusters that run on RHOSO
8+
9+
To query metrics and identifying resources across the stack, there are helper metrics that establish a correlation between {rhoso-first} infrastructure resources and their representations in the tenant {product-title} cluster.
10+
11+
To map nodes with {rhoso} compute instances, in the metric `kube_node_info`:
12+
13+
* `node` is the Kubernetes node name.
14+
15+
* `provider_id` contains the identifier of the corresponding compute service instance.
16+
17+
To map persistent volumes with {rhoso} block storage or shared filesystems shares, in the metric `kube_persistentvolume_info`:
18+
19+
* `persistentvolume` is the volume name.
20+
21+
* `csi_volume_handle` is the block storage volume or share identifier.
22+
23+
By default, the compute machines that back the cluster control plane nodes are created in a server group with a soft anti-affinity policy. As a result, the compute service creates them on separate hypervisors on a best-effort basis. However, if the state of the {rhoso} cluster is not appropriate for this distribution, the machines are created anyway.
24+
25+
In combination with the default soft anti-affinity policy, you can configure an alert that activates when a hypervisor hosts more than one control plane node of a given cluster to highlight the degraded level of high availability.
26+
27+
As an example, this PromQL query returns the number of {product-title} master nodes per {rh-openstack} host:
28+
29+
[source,promql]
30+
----
31+
sum by (vm_instance) (
32+
group by (vm_instance, resource) (ceilometer_cpu)
33+
/ on (resource) group_right(vm_instance) (
34+
group by (node, resource) (
35+
label_replace(kube_node_info, "resource", "$1", "system_uuid", "(.+)")
36+
)
37+
/ on (node) group_left group by (node) (
38+
cluster:master_nodes
39+
)
40+
)
41+
)
42+
----
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="shiftstack-prometheus-configuration"]
3+
= Monitoring clusters that run on RHOSO
4+
include::_attributes/common-attributes.adoc[]
5+
:context: shiftstack-prometheus-configuration
6+
7+
toc::[]
8+
9+
You can correlate observability metrics for clusters that run on {rhoso-first}. By collecting metrics from both environments, you can monitor and troubleshoot issues across the infrastructure and application layers.
10+
11+
There are two supported methods for metric correlation for clusters that run on {rhoso}:
12+
13+
- https://prometheus.io/docs/practices/remote_write/#remote-write-tuning[Remote writing] to an external Prometheus instance.
14+
- Collecting data from the {product-title} federation endpoint to the {rhoso} observability stack.
15+
16+
include::modules/monitoring-configuring-shiftstack-remotewrite.adoc[leveloffset=+1]
17+
18+
[role="_additional-resources"]
19+
.Additional resources
20+
* xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-remote-write-storage_configuring-the-monitoring-stack[Configuring remote write storage]
21+
* xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#adding-cluster-id-labels-to-metrics_configuring-the-monitoring-stack[Adding cluster ID labels to metrics]
22+
23+
include::modules/monitoring-configuring-shiftstack-scraping.adoc[leveloffset=+1]
24+
25+
[role="_additional-resources"]
26+
.Additional resources
27+
* xref:../../observability/monitoring/accessing-third-party-monitoring-apis.adoc#monitoring-querying-metrics-by-using-the-federation-endpoint-for-prometheus_accessing-monitoring-apis-by-using-the-cli[Querying metrics by using the federation endpoint for Prometheus]
28+
29+
include::modules/monitoring-shiftstack-metrics.adoc[leveloffset=+1]
30+
31+
[role="_additional-resources"]
32+
[id="additional-resources_{context}"]
33+
== Additional resources
34+
* xref:../../observability/cluster_observability_operator/cluster-observability-operator-overview.adoc#understanding-the-cluster-observability-operator_cluster_observability_operator_overview[Cluster Observability Operator overview]

0 commit comments

Comments
 (0)