Skip to content

Commit 3072773

Browse files
authored
Merge pull request #2 from smallcase/feat/move-monitoring-stack-nested
feat: move monitoring stack nested
2 parents 3aca25c + b27d19c commit 3072773

File tree

11 files changed

+2864
-272
lines changed

11 files changed

+2864
-272
lines changed

API.md

Lines changed: 1288 additions & 8 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

MONITORING_ARCHITECTURE.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# RDS Monitoring Architecture
2+
3+
This document describes the new separated monitoring architecture for the RDS module, which provides better organization and maintainability of monitoring and alerting code.
4+
5+
## Architecture Overview
6+
7+
The monitoring functionality has been separated into two main components with a nested stack approach:
8+
9+
1. **`src/monitoring.ts`** - Contains alert threshold definitions, interfaces, and the main monitoring logic as a nested stack
10+
2. **`src/rds.ts`** - Contains the RDS cluster creation logic (now cleaner without monitoring code)
11+
12+
## Components
13+
14+
### 1. Monitoring Module (`src/monitoring.ts`)
15+
16+
The monitoring module contains both the interfaces and the main monitoring logic:
17+
18+
#### Alert Thresholds Interface
19+
```typescript
20+
export interface AlertThresholds {
21+
cpu?: number;
22+
memory?: number; // in bytes
23+
readIops?: number;
24+
writeIops?: number;
25+
dbConnections?: number; // percentage
26+
diskQueueDepth?: number;
27+
freeStorage?: number; // in bytes
28+
networkThroughput?: number; // in bytes per second
29+
replicationLag?: number; // in milliseconds
30+
}
31+
```
32+
33+
#### Monitoring Logic
34+
35+
The monitoring module contains:
36+
37+
- **RDSMonitoringStack class** - Main monitoring nested stack
38+
- **SNS Topic creation** - For alert notifications
39+
- **Alert creation methods** - For all RDS metrics
40+
- **Dynamic threshold calculation** - Based on instance types
41+
- **Support for both primary and replica instances**
42+
43+
#### Key Features:
44+
45+
- **Dynamic Thresholds**: Automatically calculates appropriate alert thresholds based on instance type
46+
- **Custom Overrides**: Allows custom threshold values for specific instances
47+
- **Multi-Instance Support**: Handles both primary and read replica instances
48+
- **Zenduty Integration**: Supports Zenduty webhook integration
49+
- **Flexible Configuration**: Supports various monitoring configurations
50+
51+
### 2. RDS Module (`src/rds.ts`)
52+
53+
The RDS module is now cleaner and focused on:
54+
55+
- RDS cluster creation
56+
- Security group configuration
57+
- Parameter group setup
58+
- Instance configuration
59+
- Integration with the monitoring module
60+
61+
## Usage Examples
62+
63+
### Basic Usage
64+
65+
```typescript
66+
import { PostgresRDSCluster } from './src/rds';
67+
import { RDSMonitoring, AlertThresholds } from './src/monitoring';
68+
69+
// Define custom thresholds
70+
const customThresholds: AlertThresholds = {
71+
cpu: 75,
72+
memory: 3221225472, // 3GB
73+
readIops: 1500,
74+
writeIops: 3000,
75+
dbConnections: 85,
76+
diskQueueDepth: 8,
77+
freeStorage: 16106127360, // 15GB
78+
networkThroughput: 2097152, // 2MB/s
79+
replicationLag: 45000, // 45 seconds
80+
};
81+
82+
// Create RDS cluster with monitoring
83+
const rdsCluster = new PostgresRDSCluster(this, 'MyRDSCluster', {
84+
// ... RDS configuration
85+
primaryAlertThresholds: customThresholds,
86+
alertSubcriptionWebhooks: [
87+
'https://hooks.slack.com/services/YOUR/WEBHOOK',
88+
'https://www.zenduty.com/api/v1/integrations/aws-cloudwatch/YOUR_KEY/',
89+
'https://discord.com/api/webhooks/YOUR/DISCORD/WEBHOOK',
90+
],
91+
});
92+
93+
// Create separate monitoring nested stack
94+
const monitoring = new RDSMonitoringStack(this, 'RDSMonitoringStack', {
95+
clusterName: 'my-cluster',
96+
instanceType: ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.LARGE),
97+
primaryAlertThresholds: customThresholds,
98+
alertSubcriptionWebhooks: [
99+
'https://hooks.slack.com/services/YOUR/WEBHOOK',
100+
'https://www.zenduty.com/api/v1/integrations/aws-cloudwatch/YOUR_KEY/',
101+
'https://discord.com/api/webhooks/YOUR/DISCORD/WEBHOOK',
102+
],
103+
});
104+
```
105+
106+
### Advanced Usage with Read Replicas
107+
108+
```typescript
109+
const primaryThresholds: AlertThresholds = {
110+
cpu: 75,
111+
memory: 3221225472, // 3GB
112+
// ... other thresholds
113+
};
114+
115+
const replicaThresholds: AlertThresholds = {
116+
cpu: 80,
117+
memory: 2147483648, // 2GB
118+
// ... other thresholds
119+
};
120+
121+
const rdsCluster = new PostgresRDSCluster(this, 'MyRDSCluster', {
122+
// ... RDS configuration
123+
readReplicas: {
124+
replicas: 2,
125+
instanceType: ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.MEDIUM),
126+
alertThresholds: replicaThresholds,
127+
},
128+
primaryAlertThresholds: primaryThresholds,
129+
replicaAlertThresholds: replicaThresholds,
130+
});
131+
```
132+
133+
## Available Alerts
134+
135+
The monitoring module creates the following alerts:
136+
137+
### Primary Instance Alerts
138+
- **CPU Utilization** - Monitors CPU usage with dynamic thresholds
139+
- **Free Memory** - Monitors available memory
140+
- **Free Storage** - Monitors available storage space
141+
- **Read IOPS** - Monitors read operations per second
142+
- **Write IOPS** - Monitors write operations per second
143+
- **Disk Queue Depth** - Monitors disk I/O queue depth
144+
- **Database Connections** - Monitors active database connections
145+
- **Network Throughput** - Monitors network I/O
146+
- **Replication Lag** - Monitors replication delay (Multi-AZ only)
147+
- **Backup Storage** - Monitors backup storage usage (if backups enabled)
148+
149+
### Read Replica Alerts
150+
- All the same alerts as primary instances
151+
- Separate thresholds for replica-specific requirements
152+
- Individual monitoring for each replica instance
153+
154+
## Dynamic Threshold Calculation
155+
156+
The monitoring module automatically calculates appropriate thresholds based on instance type:
157+
158+
### Instance Class Thresholds
159+
160+
| Instance Class | CPU (%) | Memory (GB) | Read IOPS | Write IOPS | DB Connections (%) |
161+
|----------------|---------|-------------|-----------|------------|-------------------|
162+
| t3/t2 (Burstable) | 85 | 1 | 500 | 1000 | 60 |
163+
| m5/m6 (General) | 80 | 2 | 1000 | 2000 | 80 |
164+
| r5/r6 (Memory) | 75 | 4 | 1500 | 3000 | 85 |
165+
| c5/c6 (Compute) | 70 | 2 | 2000 | 4000 | 90 |
166+
| x1/x2 (Large) | 70 | 8 | 3000 | 6000 | 95 |
167+
168+
### Custom Overrides
169+
170+
You can override any threshold with custom values:
171+
172+
```typescript
173+
const customThresholds: AlertThresholds = {
174+
cpu: 70, // Override CPU threshold to 70%
175+
memory: 4294967296, // Override memory threshold to 4GB
176+
// Other thresholds will use instance-type defaults
177+
};
178+
```
179+
180+
## Integration with Zenduty
181+
182+
The monitoring module supports Zenduty integration for incident management:
183+
184+
```typescript
185+
const monitoring = new RDSMonitoring(this, 'RDSMonitoring', {
186+
// ... other configuration
187+
zendutyWebhookUrl: 'https://www.zenduty.com/api/v1/integrations/aws-cloudwatch/YOUR_INTEGRATION_KEY/',
188+
});
189+
```
190+
191+
## Benefits of Separated Architecture
192+
193+
1. **Better Organization**: Monitoring logic is separated from RDS creation logic
194+
2. **Reusability**: Monitoring module can be used independently
195+
3. **Maintainability**: Easier to update and maintain monitoring code
196+
4. **Flexibility**: Can create monitoring for existing RDS instances
197+
5. **Testing**: Easier to test monitoring logic in isolation
198+
6. **Customization**: More flexible configuration options
199+
200+
## Migration from Old Architecture
201+
202+
If you're migrating from the old architecture:
203+
204+
1. **Remove old alert code** from your RDS module
205+
2. **Import the new modules**:
206+
```typescript
207+
import { AlertThresholds, RDSMonitoring } from './src/monitoring';
208+
```
209+
3. **Update your configuration** to use the new interfaces
210+
4. **Create separate monitoring instances** as needed
211+
212+
## Best Practices
213+
214+
1. **Use Dynamic Thresholds**: Let the module calculate appropriate thresholds based on instance type
215+
2. **Customize When Needed**: Override thresholds only when you have specific requirements
216+
3. **Monitor Both Primary and Replicas**: Set up monitoring for all instances
217+
4. **Use Zenduty Integration**: For better incident management
218+
5. **Test Your Alerts**: Verify that alerts are triggered appropriately
219+
6. **Document Your Thresholds**: Keep track of why you chose specific threshold values

0 commit comments

Comments
 (0)