-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[systemdreceiver] Add support for scraping unit status #42058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Sorry about those bugs! I'm afraid this is very much baby's first Go code — I thought I'd been so careful about running the |
7184a31 to
b18d003
Compare
|
Have rebased on top of master, and tried to stub out the dbus connection, so we can write some tests for this. Coverage is now at the requested 80% :). I'm aware this PR is now well over the requested 500 lines. The majority of this is generated code (or golden test data), so hopefully shouldn't be too bad to review, but let me know there's a way to split this up further. |
|
Following along - definitely interested. I'm poking at the cgroups side of things locally -- not sure if we need to vendor in cadvisor like awscontainerinsightreceiver or the simpler podman/k8s/docker stats methods will work. |
receiver/systemdreceiver/testdata/expected_metrics/metrics_golden.yaml
Outdated
Show resolved
Hide resolved
|
For cgroups, I'm currently just using https://github.com/containerd/cgroups/ directy. Though this does make supporting cgroups v1 and v2 a bit more awkward — I'm currently just focusing on the latter. I'd actually been drawing more inspiration for the host receiver (as much as I can, sadly cgroups has a different model for a lot of things). So CPU usage mode (kernel vs user) is an attribute rather than having separate I'm not 100% sure this is the right approach, so thoughts welcome! |
b18d003 to
5616828
Compare
- Store possible unit states in a slice, to ensure metrics are emitted in a deterministic order.
|
Have rebased on top of master (again 😭, y'all are too productive!) to fix Is there a better way to handle this? I feel I've been doing a lot of manual Have also switched the unit name to be a resource attribute, which I think is a bit more consistent with what the other receivers do. |
|
Receiver is testing out nicely here - with the recent resource unit name change, I needed this config in the prometheus exporter to see all the metrics -- without, just a single series is shown. |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
It's not stale, it's waiting for review. |
Particularly struggling with making sure the version numbers in the Edit: Ahh, I see. I need to run |
|
Hi @SquidDev and reviewers, This is a fantastic and highly anticipated feature for the I would like to help you and the team by running functional and integration tests on my end, specifically focusing on validating the metric output and the behavior of the To ensure I'm testing against the correct environment and configuration (especially regarding D-Bus stubbing/mocking, as you mentioned), could you please provide a brief guide or an example configuration for how to:
I will follow up here with any functional test findings, bug reports, or potential suggestions for improvements as soon as I can replicate the setup. Thanks for your efforts! |
|
@josepcorrea Thanks for helping with this!
So Footnotes |
|
Please resolve the conflicts when you get a chance, I will review |
| enabled: true | ||
|
|
||
| attributes: | ||
| systemd.unit.active_state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| systemd.unit.active_state: | |
| systemd.unit.state: |
why not state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Systemd units have three states: load state, (whether the unit has been loaded), active state (whether it is running) and sub state (unit defined state).
From discussions in #37169 and #33532, we don't want to expose all three due to cardinality concerns, but it felt helpful to make explicit which of the states we were referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please help understand why use active_state instead of state?
|
Is there else needed from me for this PR, or is this just waiting for a decision on |
| var conn *dbus.Conn | ||
| switch s.cfg.Scope { | ||
| case "system": | ||
| conn, err = dbus.ConnectSystemBus(dbus.WithContext(ctx)) | ||
| case "user": | ||
| conn, err = dbus.ConnectSessionBus(dbus.WithContext(ctx)) | ||
| default: | ||
| return errInvalidScope | ||
| } | ||
|
|
||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| s.conn = conn | ||
|
|
||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unless the dbus.Connect* doesn't follow typical conventions for go, i.e.: conn should be nil if err is not, the code can follow the usual pattern:
| var conn *dbus.Conn | |
| switch s.cfg.Scope { | |
| case "system": | |
| conn, err = dbus.ConnectSystemBus(dbus.WithContext(ctx)) | |
| case "user": | |
| conn, err = dbus.ConnectSessionBus(dbus.WithContext(ctx)) | |
| default: | |
| return errInvalidScope | |
| } | |
| if err != nil { | |
| return err | |
| } | |
| s.conn = conn | |
| return err | |
| } | |
| switch s.cfg.Scope { | |
| case "system": | |
| s.conn, err = dbus.ConnectSystemBus(dbus.WithContext(ctx)) | |
| case "user": | |
| s.conn, err = dbus.ConnectSessionBus(dbus.WithContext(ctx)) | |
| default: | |
| err = errInvalidScope | |
| } | |
| return err | |
| } |
| resourceMetrics: | ||
| - resource: | ||
| attributes: | ||
| - key: systemd.unit.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, my tendency is to have this a metric attribute. Why having it as a resource attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had these as metric attributes. However, when adding cgroup support to this receiver, it felt that having a receiver per unit made a bit more sense, as the metrics are a bit more grouped together. This is what we do for the various container receivers (Docker, Podman, etc...) for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit of a gray area, right now I'm not convinced yet that it is the better trade off here. That said the component is still in development so we can experiment and adjust as appropriate.
|
Please take a look at the failing changelog check. We can merge as is and address feedback afterwards. |
|
Ahh, sorry! Thought I'd caught all the upstream changes that had gone in (had fixed the issues caused by a gofumpt bump), but clearly missed this one. All fixed! |
|
And of course there's another one! 😱 CONTRIBUTING.md currently tells you to manually run a dozen |
Description
This is start at implementing #33532. I'm trying to do this in relatively minor increments to make this reviewable, so this only adds support for reporting the active units' status.
This follows a similar approach to the httpcheck receiver, with an attribute for every possible state, with one metric set to
1, and the rest set to0.Link to tracking issue
Part of #33532, but possibly not enough to mark as fixed.
Testing
I have done some manual testing, and confirmed this generates the metrics I'd expected. Aside from the auto-generated tests, I'm afraid I've not written any unit/integration tests — I wanted some advice here first, about whether it was better to try to mock out the dbus interface, or instead try to run the tests against a real systemd instance.
Documentation
I've updated the README of the systemdreceiver component to mention the metric exposed, and the configuration options.