This repository provides a powerful monitoring stack for Ika Validator and Sui Fullnode, combining Prometheus for metric collection, Alertmanager for multi-channel notifications (Telegram, Discord, Slack, PagerDuty), and Grafana for visualization. It includes ready-made scrape and alerting configurations, systemd service units, and alert rules targeting consensus progress, sync checks, and node availability. Finally, it automates Grafana provisioning of the dashboard, with a configurable authority variable to filter panels to your own validator.
Contributions are highly welcomed! Feel free to open issues and submit pull requests to help us improve this monitoring stuck.
Our public dashboard is available here: https://grafana.trusted-point.com/public-dashboards/bfabcced297e4ffab624b2576dd9f61c
- Add Docker support for easy, containerized deployments.
- Enhance server monitoring, including system-level metrics and log collection.
- Integrate additional notification channels.
VERSION="3.3.1"
ARCH="linux-amd64"
PROMETHEUS_HOST="127.0.0.1"
PROMETHEUS_PORT="9090"
SUI_METRICS_TARGET="127.0.0.1:9174"
IKA_METRICS_TARGET="127.0.0.1:9184"
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus /var/lib/prometheus
cd /etc/prometheus
sudo wget https://github.com/prometheus/prometheus/releases/download/v${VERSION}/prometheus-${VERSION}.${ARCH}.tar.gz
sudo tar xvf prometheus-${VERSION}.${ARCH}.tar.gz --strip-components=1
./prometheus --version && ./promtool --version
sudo cp ./prometheus ./promtool /usr/local/bin/
sudo tee /etc/prometheus/prometheus.yml > /dev/null <<EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: "Ika-Validator"
static_configs:
- targets: ["${IKA_METRICS_TARGET}"]
labels:
group: 'IkaValidator'
network: 'testnet'
- job_name: "Sui-Fullnode"
static_configs:
- targets: ["${SUI_METRICS_TARGET}"]
labels:
group: 'Fullnode'
network: 'testnet'
EOF
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo tee /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--web.listen-address="${PROMETHEUS_HOST}:${PROMETHEUS_PORT}"
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload && \
sudo systemctl enable prometheus && \
sudo systemctl restart prometheus && \
sudo journalctl -u prometheus -f -o cat
sudo apt-get update
sudo apt-get install -y software-properties-common wget apt-transport-https
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl daemon-reload && \
sudo systemctl enable grafana-server && \
sudo systemctl restart grafana-server && \
sudo journalctl -u grafana-server -f -o cat
- Enter
admin
for both username and password and update the default password
Option A: Create a provisioning YAML so Grafana will load the Prometheus data source on startup:
sudo tee /etc/grafana/provisioning/datasources/prometheus.yaml > /dev/null <<EOF
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://${PROMETHEUS_HOST}:${PROMETHEUS_PORT}
isDefault: true
editable: true
EOF
sudo systemctl restart grafana-server && sudo journalctl -u grafana-server -f -o cat
Option B: Grafana supports adding data sources via the UI or entirely from the terminal (CLI).
- Navigate to Configuration → Data Sources → Add data source.
- Select Prometheus, set the URL to
http://127.0.0.1:<PROMETHEUS_PORT>
, and click Save & Test.
Option A: Grafana can automatically load dashboards from JSON files. To fetch your dashboard and make it appear in the UI, do the following:
sudo mkdir -p /var/lib/grafana/dashboards
sudo tee /etc/grafana/provisioning/dashboards/dashboard.yaml > /dev/null <<EOF
apiVersion: 1
providers:
- name: 'ika-sui'
orgId: 1
folder: ''
type: 'file'
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards
EOF
sudo wget -O /var/lib/grafana/dashboards/Trusted-Point-Ika-Sui.json https://raw.githubusercontent.com/trusted-point/Ika-Tools/refs/heads/main/grafana/dashboards/Grafana-TrustedPoint-Ika-Sui.json
sudo chown -R grafana:grafana /var/lib/grafana/dashboards
sudo systemctl restart grafana-server && sudo journalctl -u grafana-server -f -o cat
Option B: Manual UI Import
- Log in to Grafana at
http://<IP>:3000
with your credentials. - In the left-hand menu, click the + icon and select Import.
- Under Import via file, click Upload JSON file and choose the Trusted-Point-Ika-Sui.json file (you may download it locally first).
- Click Import.
After importing or provisioning the dashboard, you need to set your own validator’s authority so the panels query the correct series.
- Open the dashboard you just imported (e.g. IKA x SUI → Your Ika Dashboard).
- Click the gear icon (⚙️) in the top-right corner of the dashboard and select Variables.
- Find the authority variable in the list and click Edit.
- In Custom options, replace the example
TrustedPoint
with your own validator name(s) exactly as they appear in the authority label of your Ika metrics (e.g. MyValidatorNode). - Click Update to save the variable definition, then click Dashboard > Refresh (⟳) to re-run all queries with your new authority value.
- The dashboard panels will now display metrics for the specified authority label.
NOTE: If you don’t want to enable some of the provided notification channels, simply comment out its corresponding entry both under the route:
section and in the receivers:
list. Also, ensure that the final route (PagerDuty) does not include continue: true, so that alerts aren’t passed on after routing to PagerDuty.
VERSION="0.28.1"
ARCH="linux-amd64"
ALERTMANAGER_HOST="127.0.0.1"
ALERTMANAGER_PORT="9093"
ALERTMANAGER_WEBHOOK_PORT="9092"
TELEGRAM_BOT_TOKEN="123:AAA-AAA"
TELEGRAM_CHAT_ID="-123"
DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/123/AAA-BBB-CCC"
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/AAA/BBB/CCC"
PAGERDUTY_INTEGRATION_KEY="123abc"
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir /etc/alertmanager /var/lib/alertmanager
cd /etc/alertmanager
sudo wget https://github.com/prometheus/alertmanager/releases/download/v${VERSION}/alertmanager-${VERSION}.${ARCH}.tar.gz
sudo tar xvf alertmanager-${VERSION}.${ARCH}.tar.gz --strip-components=1
./alertmanager --version && ./amtool --version
sudo cp ./alertmanager ./amtool /usr/local/bin/
sudo tee /etc/alertmanager/alertmanager.yml > /dev/null <<EOF
global:
resolve_timeout: 5m
route:
receiver: "default"
group_by: ["alertname"]
group_wait: 20s
group_interval: 5m
repeat_interval: 1h
routes:
- receiver: "telegram"
continue: true
- receiver: "discord"
continue: true
- receiver: "slack"
continue: true
- receiver: "pagerduty"
receivers:
- name: "default"
webhook_configs:
- url: "http://localhost:${ALERTMANAGER_WEBHOOK_PORT}"
- name: "telegram"
telegram_configs:
- send_resolved: true
bot_token: "${TELEGRAM_BOT_TOKEN}"
chat_id: ${TELEGRAM_CHAT_ID}
- name: "discord"
discord_configs:
- send_resolved: true
webhook_url: "${DISCORD_WEBHOOK_URL}"
- name: "slack"
slack_configs:
- send_resolved: true
api_url: "${SLACK_WEBHOOK_URL}"
- name: "pagerduty"
pagerduty_configs:
- routing_key: "${PAGERDUTY_INTEGRATION_KEY}"
send_resolved: true
EOF
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
sudo tee /etc/systemd/system/alertmanager.service <<EOF
[Unit]
Description=Prometheus Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager \
--web.listen-address="${ALERTMANAGER_HOST}:${ALERTMANAGER_PORT}"
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload && \
sudo systemctl enable alertmanager && \
sudo systemctl restart alertmanager && \
sudo journalctl -u alertmanager -f -o cat
⚠️ IkaNoLeaderRoundProgress — triggered whenconsensus_last_committed_leader_round
is stuck.- 🧊 IkaNoSyncFetchedIndexProgress — triggered when
consensus_commit_sync_fetched_index
is stuck. - 🔴 IkaNodeDown — triggered when node not reachable.
⚠️ SuiHighestSyncedCheckpointStuck — triggered whenhighest_synced_checkpoint
is stuck.- 🧊 SuiLastExecutedCheckpointStuck — triggered when
last_executed_checkpoint
is stuck. - 🔴 SuiNodeDown — triggered when node not reachable.
sudo wget -O /etc/prometheus/ika_validator_alert_rules.yml \
https://raw.githubusercontent.com/trusted-point/Ika-Tools/main/prometheus/rules/ika_validator_alert_rules.yml
sudo wget -O /etc/prometheus/sui_fullnode_alert_rules.yml \
https://raw.githubusercontent.com/trusted-point/Ika-Tools/main/prometheus/rules/sui_fullnode_alert_rules.yml
sudo sed -i '/^scrape_configs:/i \
rule_files:\
- "sui_fullnode_alert_rules.yml"\
- "ika_validator_alert_rules.yml"\
\
alerting:\
alertmanagers:\
- static_configs:\
- targets:\
- "alertmanager:'"${ALERTMANAGER_PORT}"'"\
' /etc/prometheus/prometheus.yml
sudo systemctl restart prometheus && \
sudo journalctl -u prometheus -f -o cat
curl -X POST http://127.0.0.1:9093/api/v2/alerts \
-H "Content-Type: application/json" \
-d '[
{
"labels": {
"alertname": "TestAlert",
"severity": "critical"
},
"annotations": {
"summary": "🔥 Test alert",
"description": "This is just a test of all notification channels."
}
}
]'
3.12 Resolve the test alert in Alertmanager (or wait for automatic resolution) and send the appropriate notification
curl -X POST http://127.0.0.1:9093/api/v2/alerts \
-H "Content-Type: application/json" \
-d '[
{
"labels": {
"alertname": "TestAlert",
"severity": "critical"
},
"annotations": {
"summary": "🔔 Test alert (resolving)",
"description": "This marks the TestAlert as RESOLVED."
}
}
]'