-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Feature Description
CVMS monitors external-chain voting for Axelar's ampd client. It exports a Prometheus metric for every verifier poll, with source_chain
and poll_id
as separate labels. A value of 1
indicates a successful vote from the client, 0
is unsuccessful.
This has one drawback: every poll creates a new timeseries in the Prometheus database. This bloat will eventually degrade Prometheus querying performance. The rate of degradation will increase with the frequency of votes as the network grows. Implementing the metrics as a counter for each vote outcome rather than a metric for every individual vote would avoid this.
Reason for Need
Axelar validators run clients which vote on the status of transactions on external-chains, a process which needs to function correctly. The current method of capturing this risks degrading Prometheus monitoring as the number of votes grow. Changing the metrics, as described below, enhances alerting and avoids this risk.
Proposed Solution
Create the following counter metrics:
cvms_axelar_amplifier_verifier_correct_vote
cvms_axelar_amplifier_verifier_incorrect_vote
cvms_axelar_amplifier_verifier_unsubmitted_vote
(This assumes unsubmitted
is easy to differentiate from incorrect
, if not, this metric could be dropped.)
Publish these for each source_chain
, using labels to indicate which source_chain
they belong to.
Increment the relevant counter every time there is a poll.
This has 3 major advantages:
- Limits the amount of timeseries created in the database - no sudden Prometheus slowdown as number of votes increases.
- Alerting rules easily constructed for an increase in incorrect or unsubmitted votes over a specific time period.
- Enables analysis of the rate of voting and its impact on system resources.
Alternatives
For operators preferring the original metrics there could be a flag to toggle how the vote metrics are exported.