An automated Site Reliability Engineering (SRE) agent for enforcing and monitoring governance compliance across Azure core landing zone resources. This project helps detect and auto-remediate policy violations, configuration drift, and security misconfigurations using Azure-native services and automation scripts.
The Azure Core Governance SRE Agent provides automated monitoring and remediation for Azure resources, ensuring they comply with organizational governance policies and best practices.
Component | Purpose |
---|---|
Python Scripts | Perform checks and trigger remediations |
Azure Functions | Scheduled or triggered remediation workflows |
Bicep/Terraform | Infra provisioning templates |
Playbooks | Incident response and escalation procedures |
- Azure subscription
- Azure CLI installed
- Python 3.8+
- Appropriate Azure RBAC permissions
- Clone this repository
- Set up the required Azure resources:
cd infra/bicep # or infra/terraform ./deploy.sh
- Configure your environment variables (see Configuration)
azure-core-governance-sre-agent/
βββ scripts/ # Compliance check scripts
β βββ remediation/ # Remediation scripts
βββ azure-functions/ # Azure Functions code
β βββ http-trigger-remediator/ # HTTP-triggered remediation function
β βββ timer-trigger-checker/ # Timer-triggered compliance check function
βββ infra/ # Infrastructure as Code templates
β βββ bicep/ # Bicep templates
β βββ terraform/ # Terraform templates
βββ playbooks/ # Incident response procedures
python scripts/check_compliance.py --subscription <subscription-id>
cd azure-functions/timer-trigger-checker
func azure functionapp publish <function-app-name>
cd ../http-trigger-remediator
func azure functionapp publish <function-app-name>
This project can be integrated with CI/CD pipelines:
- GitHub Actions workflows
- Azure DevOps pipelines
Create a .env
file with the following variables:
AZURE_SUBSCRIPTION_ID=your-subscription-id
AZURE_TENANT_ID=your-tenant-id
LOG_LEVEL=INFO
config/policies.json
- Define custom policy requirementsconfig/notification-settings.json
- Configure alerts and notifications
The agent supports the following remediation workflows:
-
Automatic Remediation
- Non-compliant resources are automatically fixed based on predefined rules
- Logging and audit trail maintained for all changes
-
Approval-based Remediation
- Changes requiring approval trigger notification workflows
- Approvers can review and authorize via Teams or email
-
Manual Remediation
- Some issues include guided steps for manual resolution
- Documentation links provided for complex scenarios
βββββββββββββββββββ
β β
ββββββββββΆ Azure Resources β
β β β
β βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β β β β
β Timer Trigger βββββββΆ Checker ββββββββΆ Remediator β
β β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β β β β
β Logs & Metricsβ β Notifications β
β β β β
βββββββββββββββββββ βββββββββββββββββββ
- Dashboard - Azure Dashboard template available at
infra/dashboard
- Logging - All activity logged to Application Insights
- Reporting - Weekly compliance reports generated automatically
- All credentials stored in Azure Key Vault
- Managed Identities used for service authentication
- Regular security scanning integrated with CI/CD pipelines
-
Authentication Failures
- Verify service principal permissions
- Check Key Vault access policies
-
Remediation Failures
- Review logs in Application Insights
- Check resource locks that might prevent changes
-
Timeout Issues
- For large subscriptions, adjust the function timeout in
host.json
- For large subscriptions, adjust the function timeout in
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.