Skip to content

Commit 6cb76c4

Browse files
authored
feat: quality of service (#129)
* feat: quality of service docs * chore: prettier * fix: broken link
1 parent 0b1e9be commit 6cb76c4

File tree

5 files changed

+479
-1
lines changed

5 files changed

+479
-1
lines changed

pages/developers/_meta.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ const meta: Meta = {
1414
"blueprint-sdk": "Introduction",
1515
"blueprint-contexts": "Contexts",
1616
"blueprint-runner": "Blueprint Runner",
17+
"blueprint-qos": "Quality of Service Integration",
1718
"p2p-networking": "P2P Networking",
1819
"tangle-avs": "Build a Tangle Blueprint",
1920
"eigenlayer-avs": "Build an Eigenlayer AVS",

pages/developers/blueprint-qos.mdx

Lines changed: 342 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,342 @@
1+
---
2+
title: Quality of Service (QoS) Integration
3+
---
4+
5+
# Quality of Service (QoS) Integration Guide
6+
7+
This guide explains how to integrate and use the Blueprint SDK's Quality of Service (QoS) system to add comprehensive observability, monitoring, and dashboard capabilities to any Blueprint. QoS provides unified metrics collection, log aggregation, heartbeat monitoring, and visualization through a cohesive interface.
8+
9+
## Prerequisites
10+
11+
- Understanding of Blueprint concepts and execution model
12+
- Familiarity with Tangle Network architecture
13+
- Basic knowledge of observability concepts (metrics, logging, monitoring)
14+
15+
## QoS Overview
16+
17+
The Blueprint QoS system provides a complete observability stack:
18+
19+
- **Heartbeat Service**: Sends periodic heartbeats to Tangle to prevent slashing
20+
- **Metrics Collection**: Captures system and application metrics
21+
- **Logging**: Aggregates logs via Loki for centralized querying
22+
- **Dashboards**: Creates Grafana visualizations automatically
23+
- **Server Management**: Optionally runs containerized instances of Prometheus, Loki, and Grafana
24+
25+
The QoS system is designed to be added to any Blueprint type (Tangle, Eigenlayer, P2P, or Cron) as a background service.
26+
27+
## Integrating QoS into a Blueprint
28+
29+
The integration process involves setting up the QoS configuration and implementing the HeartbeatConsumer trait. Here's a step-by-step guide.
30+
31+
### Main Blueprint Setup
32+
33+
```rust
34+
#[tokio::main]
35+
async fn main() -> Result<(), blueprint_sdk::Error> {
36+
let env = BlueprintEnvironment::load()?;
37+
38+
// Create your Blueprint's primary context
39+
let context = MyContext::new(env.clone()).await?;
40+
41+
// Configure QoS system
42+
let qos_config = blueprint_qos::default_qos_config();
43+
let heartbeat_consumer = Arc::new(MyHeartbeatConsumer::new());
44+
45+
// Standard Blueprint runner setup with QoS
46+
BlueprintRunner::builder(TangleConfig::default(), env)
47+
.router(Router::new()
48+
.route(JOB_ID, handler.layer(TangleLayer))
49+
.with_context(context))
50+
.producer(producer)
51+
.consumer(consumer)
52+
.qos_service(qos_config, Some(heartbeat_consumer))
53+
.run()
54+
.await
55+
}
56+
```
57+
58+
### Implementing HeartbeatConsumer
59+
60+
To enable the heartbeat service, you must implement the `HeartbeatConsumer` trait, which is responsible for sending heartbeat signals to the Tangle Network:
61+
62+
```rust
63+
#[derive(Clone)]
64+
struct MyHeartbeatConsumer {
65+
// Add any required fields for heartbeat submission
66+
}
67+
68+
impl HeartbeatConsumer for MyHeartbeatConsumer {
69+
fn consume_heartbeat(
70+
&self,
71+
service_id: u64,
72+
blueprint_id: u64,
73+
metrics_data: String,
74+
) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
75+
// Implement custom heartbeat logic here, specific to blueprint
76+
Ok(())
77+
}
78+
}
79+
```
80+
81+
## QoS Configuration Options
82+
83+
### Using Default Configuration
84+
85+
The simplest way to get started is with the default configuration:
86+
87+
```rust
88+
let qos_config = blueprint_qos::default_qos_config();
89+
```
90+
91+
This initializes a configuration with:
92+
93+
- Heartbeat service (disabled until configured)
94+
- Metrics collection
95+
- Loki logging
96+
- Grafana integration
97+
- Automatic server management set to `false`
98+
99+
### Custom Configuration
100+
101+
Customize the configuration for your specific needs:
102+
103+
```rust
104+
let qos_config = QoSConfig {
105+
heartbeat: Some(HeartbeatConfig {
106+
service_id: Some(42),
107+
blueprint_id: Some(7),
108+
interval_seconds: 60,
109+
jitter_seconds: 5,
110+
}),
111+
metrics: Some(MetricsConfig::default()),
112+
loki: Some(LokiConfig::default()),
113+
grafana: Some(GrafanaConfig {
114+
endpoint: "http://localhost:3000".into(),
115+
admin_user: Some("admin".into()),
116+
admin_password: Some("admin".into()),
117+
folder: None,
118+
}),
119+
grafana_server: Some(GrafanaServerConfig::default()),
120+
loki_server: Some(LokiServerConfig::default()),
121+
prometheus_server: Some(PrometheusServerConfig::default()),
122+
docker_network: Some("blueprint-network".into()),
123+
manage_servers: true,
124+
service_id: Some(42),
125+
blueprint_id: Some(7),
126+
docker_bind_ip: Some("0.0.0.0".into()),
127+
};
128+
```
129+
130+
### Using the Builder Pattern
131+
132+
The builder pattern provides a fluent API for configuration:
133+
134+
```rust
135+
let qos_service = QoSServiceBuilder::new()
136+
.with_heartbeat_config(HeartbeatConfig {
137+
service_id: Some(service_id),
138+
blueprint_id: Some(blueprint_id),
139+
interval_seconds: 60,
140+
jitter_seconds: 5,
141+
})
142+
.with_heartbeat_consumer(Arc::new(consumer))
143+
.with_metrics_config(MetricsConfig::default())
144+
.with_loki_config(LokiConfig::default())
145+
.with_grafana_config(GrafanaConfig::default())
146+
.with_prometheus_server_config(PrometheusServerConfig {
147+
host: "0.0.0.0".into(),
148+
port: 9090,
149+
..Default::default()
150+
})
151+
.manage_servers(true)
152+
.with_ws_rpc_endpoint(ws_endpoint)
153+
.with_keystore_uri(keystore_uri)
154+
.build()?;
155+
```
156+
157+
## Recording Blueprint Metrics and Events
158+
159+
### Job Performance Tracking
160+
161+
Tracking job execution and performance in your job handlers is essential for monitoring and optimization:
162+
163+
```rust
164+
pub async fn process_job(
165+
Context(ctx): Context<MyContext>,
166+
TangleArg(data): TangleArg<String>,
167+
) -> Result<TangleResult<u64>> {
168+
let start_time = std::time::Instant::now();
169+
170+
// Process the job
171+
let result = perform_processing(&data)?;
172+
173+
// Record job execution metrics
174+
if let Some(qos) = &ctx.qos_service {
175+
qos.record_job_execution(
176+
JOB_ID,
177+
start_time.elapsed().as_secs_f64(),
178+
ctx.service_id,
179+
ctx.blueprint_id
180+
);
181+
}
182+
183+
Ok(TangleResult::Success(result))
184+
}
185+
```
186+
187+
### Error Tracking
188+
189+
Tracking job errors is crucial for monitoring and alerts:
190+
191+
```rust
192+
match perform_complex_operation() {
193+
Ok(value) => Ok(TangleResult::Success(value)),
194+
Err(e) => {
195+
if let Some(qos) = &ctx.qos_service {
196+
qos.record_job_error(JOB_ID, "complex_operation_failure");
197+
}
198+
Err(e.into())
199+
}
200+
}
201+
```
202+
203+
## Automatic Dashboard Creation
204+
205+
QoS can automatically create Grafana dashboards that display your Blueprint's metrics:
206+
207+
```rust
208+
// Create a custom dashboard for your Blueprint
209+
if let Some(mut qos) = qos_service {
210+
if let Err(e) = qos.create_dashboard("My Blueprint") {
211+
error!("Failed to create dashboard: {}", e);
212+
} else {
213+
info!("Created Grafana dashboard for My Blueprint");
214+
}
215+
}
216+
```
217+
218+
The dashboard includes:
219+
220+
- System resource usage (CPU, memory, disk, network)
221+
- Job execution metrics (frequency, duration, error rates)
222+
- Log visualization panels (when Loki is configured)
223+
- Service status and uptime information
224+
225+
## Accessing QoS in Context
226+
227+
Typically, you'll want to store the QoS service in your Blueprint context:
228+
229+
```rust
230+
#[derive(Clone)]
231+
pub struct MyContext {
232+
#[config]
233+
pub env: BlueprintEnvironment,
234+
pub data_dir: PathBuf,
235+
pub qos_service: Option<Arc<QoSService<MyHeartbeatConsumer>>>,
236+
pub service_id: u64,
237+
pub blueprint_id: u64,
238+
}
239+
240+
impl MyContext {
241+
pub async fn new(env: BlueprintEnvironment) -> Result<Self, Error> {
242+
// Initialize QoS service
243+
let qos_service = initialize_qos(&env)?;
244+
245+
Ok(Self {
246+
data_dir: env.data_dir.clone().unwrap_or_else(default_data_dir),
247+
qos_service: Some(Arc::new(qos_service)),
248+
service_id: 42,
249+
blueprint_id: 7,
250+
env,
251+
})
252+
}
253+
}
254+
```
255+
256+
You can then access the QoS service in your job handlers:
257+
258+
```rust
259+
pub async fn my_job(
260+
Context(ctx): Context<MyContext>,
261+
TangleArg(data): TangleArg<String>,
262+
) -> Result<TangleResult<()>> {
263+
// Access QoS metrics provider
264+
if let Some(qos) = &ctx.qos_service {
265+
if let Some(provider) = qos.provider() {
266+
let cpu_usage = provider.get_cpu_usage()?;
267+
info!("Current CPU usage: {}%", cpu_usage);
268+
}
269+
}
270+
271+
// Job implementation
272+
Ok(TangleResult::Success(()))
273+
}
274+
```
275+
276+
## Server Management
277+
278+
QoS can automatically manage Grafana, Prometheus, and Loki servers:
279+
280+
```rust
281+
// Configure server management
282+
let qos_config = QoSConfig {
283+
grafana_server: Some(GrafanaServerConfig {
284+
port: 3000,
285+
container_name: "blueprint-grafana".into(),
286+
image: "grafana/grafana:latest".into(),
287+
..Default::default()
288+
}),
289+
loki_server: Some(LokiServerConfig {
290+
port: 3100,
291+
container_name: "blueprint-loki".into(),
292+
image: "grafana/loki:latest".into(),
293+
..Default::default()
294+
}),
295+
prometheus_server: Some(PrometheusServerConfig {
296+
port: 9090,
297+
container_name: "blueprint-prometheus".into(),
298+
image: "prom/prometheus:latest".into(),
299+
host: "0.0.0.0".into(),
300+
..Default::default()
301+
}),
302+
docker_network: Some("blueprint-network".into()),
303+
manage_servers: true,
304+
..Default::default()
305+
};
306+
```
307+
308+
For proper operation with Docker containers, ensure:
309+
310+
1. Your application binds metrics endpoints to `0.0.0.0` (not `127.0.0.1`)
311+
2. Prometheus configuration uses `host.docker.internal` to access host metrics
312+
3. Docker is installed and the user has the necessary permissions
313+
4. A common Docker network is used for all containers
314+
315+
## Best Practices
316+
317+
✅ DO:
318+
319+
- Initialize QoS early in your Blueprint's startup sequence
320+
- Add QoS as a background service using `BlueprintRunner::background_service()`
321+
- Record job execution metrics for all important jobs
322+
- Use `#[derive(Clone)]` for your `HeartbeatConsumer` implementation
323+
- Access QoS APIs through your Blueprint's context
324+
325+
❌ DON'T:
326+
327+
- Don't create separate QoS instances for different components
328+
- Avoid using hardcoded admin credentials in production code
329+
- Don't pass the QoS service directly between jobs; use the context pattern
330+
- Don't forget to bind Prometheus metrics server to `0.0.0.0` for Docker accessibility
331+
- Don't ignore QoS shutdown or creation errors; they may indicate more serious issues
332+
333+
## QoS Components Reference
334+
335+
| Component | Primary Struct | Config | Purpose |
336+
| ----------------- | ------------------ | ---------------------- | ------------------------------------------------- |
337+
| Unified Service | `QoSService` | `QoSConfig` | Main entry point for QoS integration |
338+
| Heartbeat | `HeartbeatService` | `HeartbeatConfig` | Sends periodic liveness signals to chain |
339+
| Metrics | `MetricsService` | `MetricsConfig` | Collects system and application metrics |
340+
| Logging | N/A | `LokiConfig` | Configures log aggregation to Loki |
341+
| Dashboards | `GrafanaClient` | `GrafanaConfig` | Creates and manages Grafana dashboards |
342+
| Server Management | `ServerManager` | Various server configs | Manages Docker containers for observability stack |

pages/network/governance/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The governance system of Tangle Network is divided into two parts, the public re
2222

2323
Proposals can be made by any token holder. Others can agree with the proposal by seconding it and providing tokens equivalent to the original bond. The most seconded proposal during every launch period is moved to the public referenda table for active voting. Voters can lock their tokens for a longer duration to amplify their vote.
2424

25-
Detailed information on the governance system can be found [here](https://wiki.polkadot.network/learn/archive/learn-governance).
25+
Detailed information on the governance system can be found [here](https://wiki.polkadot.network/general/governance-apps/).
2626

2727
## Important Parameters for Democracy Module
2828

pages/operators/_meta.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ const meta: Meta = {
1616
operator: "Running an operator",
1717
pricing: "Pricing",
1818
benchmarking: "Blueprint Benchmarking",
19+
"quality-of-service": "Quality of Service",
1920
"-- Eigenlayer AVS Operators": {
2021
type: "separator",
2122
title: "Eigenlayer AVS Operators",

0 commit comments

Comments
 (0)