Ideas for "tagging" and listing services/hosts for monitoring? #14261
Replies: 3 comments 7 replies
-
I use Netbox to drive node_exporter, windows_exporter and snmp_exporter - but not blackbox services. I have tags "prom_node", "prom_windows" and "prom_snmp" to determine which devices and VMs to monitor. The relevant prometheus scrape configs are:
For blackbox_exporter, I just maintain the files statically; it might be possible from Netbox services with custom fields but I haven't thought about it too much, and in any case there are services I want to monitor which are not attached to Netbox devices or VMs. Scrape config:
The files in blackbox.d look like this:
This lets me scrape by IP address but give a different instance label, if I so choose. As for "disable monitoring for some services, or set different service priorities, which determines alerting behaviour": disabling monitoring and disabling alerting are two different things. Disabling monitoring I use the Status: i.e. only monitor devices/VMs with status "active". I would generally disable alerting via labels, typically creating silences in alertmanager if this is a short-term thing. Alertmanager routing rules use tenant and/or role to route alerts. If you wanted additional control via Netbox you'd add some additional label(s) and match them in your alertmanager routing rules; or use regexp matches in alertmanager to match the Netbox "tags" directly. |
Beta Was this translation helpful? Give feedback.
-
You're right - just tested and it does. The awkward part I found is with snmp_exporter, where you might have multiple modules selected and you want to pass
(with single quotes - not even JSON!)
As you've found, blackbox_exporter is capable of much more than just establishing a TCP connection. You can do HTTP(S) exchanges, you can specify the path, you can provide authentication, you can check the headers and/or body in the response etc. Whilst you might not necessarily want to exercise the whole service (and indeed that could take multiple tests), you might at least want to check that the login page is displayed correctly, with some expected content, not returning a 500 error, and certificate is valid. Perhaps in future blackbox_exporter will be expanded to allow a list of modules in the same call, which would be neat. |
Beta Was this translation helpful? Give feedback.
-
Having thought about this a bit more, and having looked at how some monitoring platforms like check_mk handle this, I'm starting to think that config contexts rather than services are the best way to handle this. Config contexts can be assigned to all devices with a particular role, or in a site, or with a given tag, or with a particular platform. This means you wouldn't have to set up specific monitoring rules for every single one, but they could all inherit a consistent set of monitoring. Given the deep merge, you could put everything under a single "monitoring" key and still be able to override selectively. A single service might be like this:
Using the Config Context tab on an individual device you could see exactly what's being monitored on that device; but the generation of this complete context can be inherited from multiple levels, and overridden at device level only if necessary. Using tags to generate most of it would mean most manual JSON writing could be avoided. Anyway, it's just an idea I thought was worth mentioning. The downside is that this wouldn't work using the out-of-the-box prometheus sd plugin. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm looking to use the NetBox API together with Prometheus blackbox exporter and HTTP discovery in order to determine what needs monitoring. I will have a translation layer in between for transforming the results into the appropriate format for Prometheus. So, what I need is something like one or more lists of monitoring targets. Utilizing the NetBox
Service
model seems like the obvious starting point here. I would add, for example, a TCP 443 service to each host that needs a web server monitored, and then retrieve a list of all those services, along with the appropriate IP address. GraphQL seems to be able to easily accomplish this.But then I might want to disable monitoring for some services, or set different service priorities, which determines alerting behaviour, etc. Using boolean (enabled) and choice (priority) custom fields seems appropriate, but unfortunately it is not possible to filter on custom fields using GraphQL at this time. I could, of course, perform the additional filtering in the translation layer, but still. Using tags would be another option that I think would support GraphQL filtering, but doesn't seem as appropriate.
Just curious if anyone has any comments or ideas, or even does something similar.
Beta Was this translation helpful? Give feedback.
All reactions