Skip to content

sre-take-home-exercise-python Assessment - Rahul Sanjay Panchal #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
venv/
9 changes: 9 additions & 0 deletions config/endpoints.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- name: Example Google
url: https://www.google.com

- name: Example Post Request
url: https://httpbin.org/post
method: POST
headers:
Content-Type: application/json
body: '{"test": "hello"}'
27 changes: 20 additions & 7 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,49 @@
import requests
import time
from collections import defaultdict
from urllib.parse import urlparse
import time as t

# Function to load configuration from the YAML file
def load_config(file_path):
with open(file_path, 'r') as file:
return yaml.safe_load(file)

# Function to perform health checks
def check_health(endpoint):
url = endpoint['url']
method = endpoint.get('method')
method = endpoint.get('method', 'GET')
headers = endpoint.get('headers')
body = endpoint.get('body')

try:
response = requests.request(method, url, headers=headers, json=body)
if 200 <= response.status_code < 300:
start = t.time()
response = requests.request(method, url, headers=headers, json=body, timeout=5)
elapsed_ms = (t.time() - start) * 1000

print(f"{url} responded in {int(elapsed_ms)}ms with status {response.status_code}")

if 200 <= response.status_code < 300 and elapsed_ms <= 500:
return "UP"
else:
return "DOWN"
except requests.RequestException:
except requests.RequestException as e:
print(f"Request to {url} failed: {e}")
return "DOWN"

# Function to extract domain name from URL (ignoring ports)
def get_domain(url):
parsed_url = urlparse(url)
domain = parsed_url.hostname
return domain

# Main function to monitor endpoints
def monitor_endpoints(file_path):
config = load_config(file_path)
domain_stats = defaultdict(lambda: {"up": 0, "total": 0})

while True:
for endpoint in config:
domain = endpoint["url"].split("//")[-1].split("/")[0]
domain = get_domain(endpoint["url"])
result = check_health(endpoint)

domain_stats[domain]["total"] += 1
Expand All @@ -58,4 +71,4 @@ def monitor_endpoints(file_path):
try:
monitor_endpoints(config_file)
except KeyboardInterrupt:
print("\nMonitoring stopped by user.")
print("\nMonitoring stopped by user.")
Binary file added output/Output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
91 changes: 91 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Site Reliability Engineering - Endpoint Availability Monitor

This is a command-line tool written in Python to monitor the availability of HTTP endpoints, as part of the Fetch Rewards Site Reliability Engineering take-home exercise.

## 📋 Overview

As a Site Reliability Engineer, it's important to monitor service uptime and build processes that help others identify and respond to incidents. This tool checks HTTP endpoints periodically and reports cumulative availability by domain, helping identify reliability trends over time.

---

## ✅ Features

- Accepts configuration via a YAML file.
- Periodic health checks every 15 seconds.
- Availability calculated **cumulatively** per domain.
- Endpoints are considered **available** only if:
- HTTP status code is between `200` and `299`.
- Response time is `≤ 500ms`.
- Port numbers in URLs are ignored when grouping by domain.

---

## 🚀 Getting Started

### Prerequisites

- Python 3.7+
- `pip` for managing dependencies
- (Optional but recommended) a virtual environment

### Install Dependencies

```bash
pip install -r requirements.txt
```

or manually

```bash
pip install requests pyyaml
```

### ✅ Check for endpoints.yaml

If config/endpoints.yaml file doesn't exist then create a YAML file like config/endpoints.yaml:

```bash
- name: Google
url: https://www.google.com
- name: HTTPBin
url: https://httpbin.org/status/200
method: GET
```

### Run the Montior

```bash
python main.py config/endpoints.yaml
```

### Your output should look like

![Monitor Output](output/Output.png)

## 🛠️ Code Changes and Improvements

### 1. Availability Calculation

- **Issue:** The initial code did not calculate the availability cumulatively over time.
- **Solution:** Implemented logic to track the number of "UP" and "DOWN" responses for each domain across multiple check cycles, and calculated the availability as a percentage.

### 2. Response Time Validation

- **Issue:** There was no check for response time, leading to endpoints potentially being marked as "UP" even if they took longer than 500ms to respond.
- **Solution:** Added a validation step that ensures an endpoint is only considered "UP" if its response time is ≤ 500ms.

### 3. Domain Parsing

- **Issue:** Domain names were not parsed correctly, especially when port numbers were included in the URL.
- **Solution:** Implemented a function that extracts the domain name from the URL while ignoring port numbers.

### 4. Error Handling for Failed Requests

- **Issue:** The initial code did not properly handle failed HTTP requests.
- **Solution:** Introduced exception handling to catch request failures and classify those endpoints as "DOWN".

---

### 📋 Conclusion

These changes ensure the tool meets the provided requirements, including cumulative availability reporting, response time validation, and ignoring port numbers when determining domain availability.
6 changes: 6 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
certifi==2025.1.31
charset-normalizer==3.4.1
idna==3.10
PyYAML==6.0.2
requests==2.32.3
urllib3==2.4.0