Skip to content

This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.

Notifications You must be signed in to change notification settings

shantoroy/site-reliability-engineering-101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 

Repository files navigation

#100daysofSRE - Site Reliability Engineering Notes (SRE-101)

I have worked as a Site Reliability Engineer (SRE) at Charles Schwab since 2024. Here, I plan to take the #100dayschallenge to note important SRE topics and resources.

I have planned the contents for next 100 days, and I will be posting blog posts under the hashtag #100daysofSRE. ✌️

Blog Posts

  1. #100daysofSRE (Day 01): Introduction to Site Reliability Engineering
  2. #100daysofSRE (Day 02): History of SRE and its Evolution
  3. #100daysofSRE (Day 03): SLAs, SLOs, and SLIs — understanding the metrics of reliability
  4. #100daysofSRE (Day 04): Chaos Engineering and SRE - Techniques and Tools to Break Things on Purpose
  5. #100daysofSRE (Day 05): Automation Benefits, Techniques, and Tools in SRE
  6. #100daysofSRE (Day 06): Incident Management and Response for Site Reliability Engineers
  7. #100daysofSRE (Day 07): Effective Communication during Incidents for Better Incident Response
  8. #100daysofSRE (Day 08): Root Cause Analysis and Post-Incident Reviews for SRE
  9. #100daysofSRE (Day 09): Monitoring and Observability in SRE
  10. #100daysofSRE (Day 10): Grafana vs Splunk for Monitoring System and Applications
  11. #100daysofSRE (Day 11): Logging and Log Analysis in Site Reliability Engineering- Techniques, Tools, and Best Practices
  12. #100daysofSRE (Day 12): Alerting and Notification Strategies and Best Practices in SRE
  13. #100daysofSRE (Day 13): Capacity Planning and Management in Site Reliability Engineering
  14. #100daysofSRE (Day 14): Load Testing and Stress Testing in Site Reliability Engineering
  15. #100daysofSRE (Day 15): Disaster Recovery Planning and Testing in SRE
  16. #100daysofSRE (Day 16): High Availability and Redundancy Strategies for Data
  17. #100daysofSRE (Day 17): Techniques, Tools, and Best Practices for Performance Optimization and Tuning in Site Reliability Engineering
  18. #100daysofSRE (Day 18): 25 Intermediate-level Linux Commands useful for SysAdmin, DevOps, and SRE
  19. #100daysofSRE (Day 19): Simplifying Log Analysis with Linux Sed Command: Basic and Templates
  20. #100daysofSRE (Day 20): Simplifying Log Analysis with Linux awk Command: Basic and Templates
  21. #100daysofSRE (Day 21): How to use Supervisor to manage a script on Linux
  22. #100daysofSRE (Day 22): Essential /var/log Files for SREs and How to Analyze Them
  23. #100daysofSRE (Day 23): Modernize and Containerize your Applications or Microservices using Docker
  24. #100daysofSRE (Day 24): Writing a Dockerfile – Best Practices & Enhancements
  25. #100daysofSRE (Day 25): Writing a Production-Grade Dockerfile for Legacy Applications
  26. #100daysofSRE (Day 26): Docker Compose - Simplifying Multi-Container Deployments
  27. #100daysofSRE (Day 27): Building a Hacking Lab with Docker Compose
  28. #100daysofSRE (Day 28): Deploying an AI Chatbot with Docker Compose
  29. #100daysofSRE (Day 29): Kubernetes over Docker-compose – Why It’s Better for Production
  30. #100daysofSRE (Day 30): Learn Kubernetes Commands and Operations using Minikube
  31. #100DaysOfSRE (Day 31): How to Write Kubernetes Manifest Files: Kubernetes vs Docker-Compose
  32. #100DaysOfSRE (Day 32): Advanced Kubernetes: Ingress, ConfigMaps, Secrets & Helm
  33. #100DaysOfSRE (Day 33): Monitoring Kubernetes Apps with Prometheus & Grafana
  34. #100DaysOfSRE (Day 34): Automating Kubernetes Deployments with ArgoCD & GitOps
  35. #100DaysOfSRE (Day 35): Kubernetes CI/CD Pipeline with GitHub Actions & ArgoCD
  36. #100DaysOfSRE (Day 36): Kubernetes Helm Charts – Package & Deploy Applications

YouTube Channels for SREs

  1. TechWorld with Nana
  2. Anton Putra
  3. freeCodeCamp.org
  4. Professor Messer
  5. Google Cloud Tech
  6. IBM Technology
  7. ByteByteGo
  8. Fireship
  9. NetworkChuck
  10. Tech With Soleyman
  11. ByteMonk
  12. Christian Lempa
  13. David Ondrej
  14. DevOps Journey

About

This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published