You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/content/en/docs/building-operators/golang/advanced-topics.md
+211-3Lines changed: 211 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -113,15 +113,205 @@ func init() {
113
113
* After adding new import paths to your operator project, run `go mod vendor` if a `vendor/` directory is present in the root of your project directory to fulfill these dependencies.
114
114
* Your 3rd party resource needs to be added before add the controller in `"Setup all Controllers"`.
115
115
116
-
### Metrics
116
+
### Monitoring and Observability
117
+
This section covers how to create custom metrics, [alerts] and [recording rules] for your operator. It focuses on the technical aspects, and demonstrates the implementation by updating the sample [memcached-operator].
117
118
118
-
To learn about how metrics work in the Operator SDK read the [metrics section][metrics_doc] of the Kubebuilder documentation.
119
+
For more information regarding monitoring best practices, take a look at our docs on [observability-best-practices].
119
120
121
+
#### Prerequisites
122
+
The following steps are required in order to inspect the operator's custom metrics, alerts and recording rules:
123
+
- Install Prometheus and Prometheus Operator. We recommend using [kube-prometheus] in production if you don’t have your own monitoring system. If you are just experimenting, you can only install Prometheus and Prometheus Operator.
124
+
- Make sure Prometheus has access to the operator's namespace, by setting the corresponding RBAC rules.
125
+
126
+
Example: [prometheus_role.yaml] and [prometheus_role_binding.yaml]
127
+
128
+
#### Publishing Custom Metrics
129
+
If you wish to publish custom metrics for your operator, this can be easily achieved by using the global registry from `controller-runtime/pkg/metrics`.
130
+
One way to achieve this is to declare your collectors as global variables, register them using `RegisterMetrics()` and call it in the controller's `init()` function.
131
+
132
+
Example custom metric: [MemcachedDeploymentSizeUndesiredCountTotal]
The next step would be to set the controller's logic according to which we update the metric's value. In this case, the new metric type is `Counter`, thus a valid update operation would be to increment its value.
179
+
180
+
[Metric update example]:
181
+
182
+
```go
183
+
...
184
+
size := memcached.Spec.Size
185
+
if *found.Spec.Replicas != size {
186
+
// Increment MemcachedDeploymentSizeUndesiredCountTotal metric by 1
Different metrics types have different valid operations. For more information, please follow [Prometheus Golang client].
192
+
193
+
#### Publishing Alerts and Recording Rules
194
+
In order to add alerts and recording rules, which are unique to the operator's needs, we'll create a dedicated PrometheusRule object, by using [prometheus-operator API].
"description": "No running memcached-operator pods were detected in the last 5 min.",
250
+
},
251
+
For: "5m",
252
+
Labels: map[string]string{
253
+
"severity": "critical",
254
+
},
255
+
}
256
+
}
257
+
```
258
+
259
+
Then, we may want to ensure that the new PrometheusRule is being created and reconciled. One way to achieve this is by expanding the existing `Reconcile()` function logic.
if !reflect.DeepEqual(foundRule.Spec.DeepCopy(), desiredRuleSpec) {
283
+
desiredRuleSpec.DeepCopyInto(&foundRule.Spec)
284
+
if r.Update(ctx, foundRule); err != nil {
285
+
log.Error(err, "Failed to update prometheus rule")
286
+
return ctrl.Result{}, nil
287
+
}
288
+
}
289
+
...
290
+
...
291
+
}
292
+
```
293
+
294
+
- Please review the [observability-best-practices] for additional important information regarding alerts and recording rules.
295
+
296
+
297
+
#### Alerts Unit Testing
298
+
It is highly recommended implementing unit tests for prometheus rules. For more information, please follow the Prometheus [unit testing documentation]. For examples of unit testing in a Golang operator, see the sample memcached-operator [alerts unit tests].
299
+
300
+
#### Inspecting the metrics, alerts and recording rules with Prometheus UI
301
+
Finally, in order to inspect the exposed metrics and alerts, we need to forward the corresponding port where metrics are published by Prometheus (usually `9090`, which is the default value). This can be done with the following command:
where we assume that the prometheus service is available in the `monitoring` namespace.
308
+
309
+
Now you can access Prometheus UI using `http://localhost:9090`. For more details on exposing prometheus metrics, please refer [kube-prometheus docs].
120
310
121
311
### Handle Cleanup on Deletion
122
312
123
313
Operators may create objects as part of their operational duty. Object accumulation can consume unnecessary resources, slow down the API and clutter the user interface. As such it is important for operators to keep good hygiene and to clean up resources when they are not needed. Here are a few common scenarios.
124
-
314
+
125
315
#### Internal Resources
126
316
127
317
A typical example of correct resource cleanup is the [Jobs][jobs] implementation. When a Job is created, one or multiple Pods are created as child resources. When a Job is deleted, the associated Pods are deleted as well. This is a very common pattern easily achieved by setting an owner reference from the parent (Job) to the child (Pod) object. Here is a code snippet for doing so, where "r" is the reconcilier and "ctrl" the controller-runtime library:
@@ -311,3 +501,21 @@ Authors may decide to distribute their bundles for various architectures: x86_64
[init() function example]:https://github.com/operator-framework/operator-sdk/blob/master/testdata/go/v4-alpha/monitoring/memcached-operator/cmd/main.go
[alerts unit tests]:https://github.com/operator-framework/operator-sdk/tree/master/testdata/go/v4-alpha/monitoring/memcached-operator/monitoring/prom-rule-ci
0 commit comments