Skip to content

4 - Collect feedback on dashboard prototype #3

@cfl0ws

Description

@cfl0ws

Oasis Mission Control Call for Feedback

Chainflow and our development partner Vitwit have been awarded an Oasis grant to build the Oasis Mission Control Validator Monitoring and Alerting Dashboard. You can find more details about that here.

We are feeling excited to share this prototype with the community. Validators, we're building this for you.

Please review the work done so far and provide feedback. We'll use this feedback to update the prototype to provide a final and open-sourced version for their use.

For example -

1 - Is the dashboard missing any key metrics?

2 - Are there any additional alerts you'd like to see be made available?

3 - Is there anything we can do to organize the information in a more user-friendly way, e.g. reorganize existing dashboards and/or create new ones?

Please provide your feedback in the comments of this issue.

Here's a brief overview of the dashboards and current alerts.

Summary Dashboard

This view provides a quick-look at overall validator and system health.

Screen Shot 2020-10-27 at 9 19 24 AM

Screen Shot 2020-10-27 at 9 19 47 AM

Validator Monitoring Dashboard

This view provides a comprehensive look at validator details and performance, expanding on the summary dashboard. It will also includes proposal information, once Oasis implements a Governance module.

Note: The system displays the number of total peers. For those that choose to implement a sentry node configuration, we will implement a metric that shows the peer names as well.

This is useful to confirm a validator is connected to the peers an operator would expect their validator to be connected to. In this scenario, there will also be an alert configured that alerts a user if the number of peers drops below a specified number.

For example, if your validator is connected to two sentries, the system will alert you if the number of peers drops below two.

Screen Shot 2020-10-27 at 9 22 07 AM

Screen Shot 2020-10-27 at 9 22 22 AM

System Monitoring Dashboard

This view provides a comprehensive look at system performance metrics, expanding on the summary dashboard. Here you'll find all the system metrics you'd expect to see in a comprehensive system monitoring tool.

Screen Shot 2020-10-27 at 9 23 59 AM

Screen Shot 2020-10-27 at 9 24 13 AM

Screen Shot 2020-10-27 at 9 24 29 AM

Screen Shot 2020-10-27 at 9 24 49 AM

Screen Shot 2020-10-27 at 9 25 01 AM

Screen Shot 2020-10-27 at 9 25 14 AM

Screen Shot 2020-10-27 at 9 25 30 AM

Screen Shot 2020-10-27 at 9 25 40 AM

Screen Shot 2020-10-27 at 9 25 51 AM

Alerting

So far, these alerts are configured -

  • Alert when the missed blocks count reaches or exceeds missed_blocks_threshold.
    • This is an emergency alert that gets sent to Telegram and email. It's easily integrated with Pager Duty via email.
  • Alert when no.of peers count falls below of num_peers_threshold.
  • Alert when oasis node is not running on the validator.
  • Alert about validator health, i.e. whether it's voting or jailed.
    • This is a sanity check alert, that let's you know your validator is voting (if it is). It can be configured to send at user-specified times during the day.
  • Alert when the voting power of your validator drops below voting_power_threshold.

This image shows some of those alerts in action.

Screen_Shot_2020-09-07_at_2 34 22_PM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions