An intelligent Kubernetes troubleshooting system using AI agents for automated incident response and root cause analysis.
This project demonstrates how to build an agentic AIOps system that can automatically investigate Kubernetes issues, analyze observability data, and provide actionable insights for SRE teams.
- 🔍 Intelligent Diagnostics: AI-powered Kubernetes cluster analysis
- 📊 Observability Integration: CloudWatch metrics, logs, and alarms analysis
- 💾 Database Insights: DynamoDB performance and throttling detection
- 🤖 Multi-Agent Coordination: Specialized agents working together
- 🔗 Amazon Q Integration: Natural language interface for investigations
- k8sgpt 0.4.22+ (and make sure amazonbedrock has been configured here )
- docker 27.3.1+
- python 3.13+
- kubectl 1.33+
- aws cli 2.27.2+
- Export AWS credentials into terminal
- Install retail-store-sample-app
- Install manually cloudwatch container insights (doc)
# Install dependencies
uv sync
#(optional) create package
uv pip install -e .
# Testing
python scripts/test_orchestrator.py
# ~/.aws/amazonq/mcp.json
{
"mcpServers": {
"sherlock": {
"command": "sherlock-mcp-server",
"args": [],
"env": {
"AWS_REGION": "us-east-1",
"KUBECONFIG": "~/.kube/config",
"BYPASS_TOOL_CONSENT": "true"
},
"disabled": false,
"autoApprove": []
}
}
}
Troubleshooting:
tail -f ~/.aws/amazonq/sherlock-mcp.log
Coming soon...
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.