Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Introduces an early-warning series of prometheus alerts to attempt to catch issues with performance at an early stage in development.
As the e2e tests run, the installed prometheus instance is scraping metrics from catalogd and operator-controller, and will fire alerts based on rules introduced in this PR. Since we're running these tests on the github runners which do not have consistent performance, our alerts must be based on platform-independent metrics and are therefore limited. Any other ideas for metrics to check on this PR are appreciated!
Once the e2e tests finish, prometheus is queried for active alerts. Any alerts found in
pending
state will result in a warning being set on the e2e workflow. Any alerts infiring
state will give an error. These errors do not (at the moment) fail the run, but are visible when the workflow details are viewed.For instance:
I am still in the process of tuning the alerts, so at the moment I am not making this a required check.
Potential Enhancements:
Closes #1904
Closes #1905
Reviewer Checklist