Skip to content

IH-615: move metrics collection out of RRDD #6016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

last-genius
Copy link
Contributor

@last-genius last-genius commented Sep 25, 2024

Moves out two metrics collectors out of RRDD into separate plugins:

  • CPU-related metrics
  • Network-related metrics

Three more collectors are still in xcp_rrdd.ml, this is because memory-, HA-, and SR-cache-related metrics collection is tightly integrated with RRDD itself, exposing calls through RRDD bindings for XAPI to use. To convert these into plugins (and drop the collection-related code in xcp_rrdd.ml, and stop linking to Xenctrl), these calls need to be done through standard message switch mechanism instead. This is for a future PR.

Also:

  • netdev metrics generation is tightly integrated with networkd, serializing these into a file, which the plugin reads from. not terribly efficient but i don't see a way to disentangle it from networkd, so it's kept this way.

Needs to be merged together with a corresponding spec repo PR, I will open it after XenRT testing completes (currently running BVT/BST) + also need to test manually

@last-genius last-genius force-pushed the private/asultanov/rrdd-refactor branch from 2001e34 to 9ca88ac Compare September 25, 2024 14:47
@last-genius last-genius force-pushed the private/asultanov/rrdd-refactor branch 3 times, most recently from a542c64 to f8fdcfa Compare September 30, 2024 15:02
@last-genius last-genius marked this pull request as ready for review September 30, 2024 15:02
@last-genius
Copy link
Contributor Author

last-genius commented Sep 30, 2024

Encountered crashes with the previous version of the code, determined that the cause was removing deprecated xcp-rrdd calls, so reverted these and had to also leave HA-related metrics collection as is for now, it needs to be converted to use the message switch separately. This is currently finishing up XenRT testing (#205763) after some manual testing and looking good.

UPD: BST+BVT passed

@robhoes
Copy link
Member

robhoes commented Sep 30, 2024

netdev metrics generation is tightly integrated with networkd, serializing these into a file, which the plugin reads from. not terribly efficient but i don't see a way to disentangle it from networkd, so it's kept this way.

Essentially, networkd is an rrdd plugin using protocol "v0". I think that much of the code in network_monitor_thread.ml can be merged into the new rrdd plugin. Part of the data, however, goes to xapi (see monitor_dbcalls.ml) and goes into its DB. And that particular data isn't numbers, so does not fit will in the RRD protocol.

@psafont
Copy link
Member

psafont commented Oct 1, 2024

after some manual testing

What did this involve? Opening xencenter and checking that all the metrics were still reported? (and check that they changed when adding traffic / cpu load)

@last-genius
Copy link
Contributor Author

What did this involve? Opening xencenter and checking that all the metrics were still reported? (and check that they changed when adding traffic / cpu load)

I initially tested with xe host-data-source-query, but yes, XenCenter also shows performance stats for hosts and VMs (in particular, the modified CPU and network-related stats):

{DF938305-8227-4948-96F6-59B219142C51}

Andrii Sultanov added 6 commits October 1, 2024 11:18
Several metrics collectors still rely on a similar function in xcp_rrdd, but
plugins will use this factored-out version.

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
It still currently reads from a file written to by networkd and deserializes
the stats.

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
…plugin

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
@last-genius last-genius force-pushed the private/asultanov/rrdd-refactor branch from f8fdcfa to aa631b9 Compare October 1, 2024 11:50
@last-genius
Copy link
Contributor Author

Rebased on top of master, there were some dune/makefile changes that modified things around rrdd plugins.

Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed moving the part of network_monitor_thread.ml (xcp-networkd) that ends up doing into the RRDs (the rx and tx counters) to the new RRD plugin daemon. We will leave the other stats (carrier, duplex, etc), which are read by xapi and put in its DB, in xcp-networkd.

I am approving this PR now so that we can merge it, but we should do the above relatively soon as well.

@robhoes robhoes added this pull request to the merge queue Oct 1, 2024
Merged via the queue into xapi-project:master with commit 6e16163 Oct 1, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants