Skip to content

feat(SQO): Introduce circuit breaker. #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jun 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1a1a0b3
Rename tests for consistent naming across test modules
MoonBoi9001 Jun 23, 2025
46aee0a
Update test_configuration.py
MoonBoi9001 Jun 24, 2025
38d02c6
Update requirements to latest versions
MoonBoi9001 Jun 24, 2025
4e27958
Ruff
MoonBoi9001 Jun 24, 2025
e2d9e6a
Update requirements.txt
MoonBoi9001 Jun 24, 2025
0852d5d
Update test_slack_notifications.py
MoonBoi9001 Jun 24, 2025
707ef75
Update README.md
MoonBoi9001 Jun 24, 2025
7ccd7df
Update README.md
MoonBoi9001 Jun 24, 2025
bd1e1f2
Update Dockerfile
MoonBoi9001 Jun 24, 2025
47e4cbc
Update .gitignore
MoonBoi9001 Jun 24, 2025
393396e
Update blockchain client for newer web3 library
MoonBoi9001 Jun 24, 2025
19a9367
Update test_blockchain_client.py
MoonBoi9001 Jun 24, 2025
6df9983
Update blockchain_client.py
MoonBoi9001 Jun 24, 2025
38635e1
Ruff
MoonBoi9001 Jun 24, 2025
899bdaf
update blockchain client and test module
MoonBoi9001 Jun 24, 2025
f1ea781
add circuit breaker
MoonBoi9001 Jun 24, 2025
294979a
implement circuit breaker into core module
MoonBoi9001 Jun 24, 2025
b8fbe43
add mermaid diagram documenting circuit breaker logic
MoonBoi9001 Jun 24, 2025
c0529da
Ruff
MoonBoi9001 Jun 25, 2025
a371b30
better documentation around the circuit breaker
MoonBoi9001 Jun 25, 2025
21760a0
fix failing tests
MoonBoi9001 Jun 25, 2025
0894ee1
on success notification, make sure we document the successful RPC pro…
MoonBoi9001 Jun 25, 2025
25b2714
Ruff
MoonBoi9001 Jun 25, 2025
d417ff0
oracle working
MoonBoi9001 Jun 25, 2025
5192397
Update .gitignore
MoonBoi9001 Jun 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,6 @@ contracts/contract.abi.json
/src/utils/__pycache__
/src/models/__pycache__
.coverage

# Ignore docker compose override file
docker-compose.override.yml
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Dockerfile to create a clean, lightweight Docker Image for the Service Quality Oracle

# Use Python 3.9 slim as the base image for a lightweight container
FROM python:3.9-slim
FROM python:3.11-slim

# Add metadata labels
LABEL description="Service Quality Oracle" \
Expand Down
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,21 @@ Please refer to the [ELIGIBILITY_CRITERIA.md](./ELIGIBILITY_CRITERIA.md) file to

The application follows a clear data flow, managed by a daily scheduler:

1. **Scheduler (`scheduler.py`)**: This is the main entry point. It runs on a schedule (e.g., daily), manages the application lifecycle, and triggers the oracle run. It is also responsible for catching up on any missed runs.
1. **Scheduler (`scheduler.py`)**: This is the main entry point. It runs on a schedule (e.g., daily), manages the application lifecycle, and triggers the oracle run. It is also responsible for catching up on any missed runs.

2. **Orchestrator (`service_quality_oracle.py`)**: For each run, this module orchestrates the end-to-end process by coordinating the other components.
2. **Orchestrator (`service_quality_oracle.py`)**: For each run, this module orchestrates the end-to-end process by coordinating the other components.

3. **Data Fetching (`bigquery_provider.py`)**: The orchestrator calls this provider to execute a configurable SQL query against Google BigQuery, fetching the raw indexer performance data.
3. **Data Fetching (`bigquery_provider.py`)**: The orchestrator calls this provider to execute a configurable SQL query against Google BigQuery, fetching the raw indexer performance data.

4. **Data Processing (`eligibility_pipeline.py`)**: The raw data is passed to this module, which processes it, filters for eligible and ineligible indexers, and generates CSV artifacts for auditing and record-keeping.
4. **Data Processing (`eligibility_pipeline.py`)**: The raw data is passed to this module, which processes it, filters for eligible and ineligible indexers, and generates CSV artifacts for auditing and record-keeping.

5. **Blockchain Submission (`blockchain_client.py`)**: The orchestrator takes the final list of eligible indexers and passes it to this client, which handles the complexities of batching, signing, and sending the transaction to the blockchain via RPC providers with built-in failover.
5. **Blockchain Submission (`blockchain_client.py`)**: The orchestrator takes the final list of eligible indexers and passes it to this client, which handles the complexities of batching, signing, and sending the transaction to the blockchain via RPC providers with built-in failover.

6. **Notifications (`slack_notifier.py`)**: Throughout the process, status updates (success, failure, warnings) are sent to Slack.
6. **Notifications (`slack_notifier.py`)**: Throughout the process, status updates (success, failure, warnings) are sent to Slack.

## Architecture

For a more detailed explanation of key architectural decisions, such as the RPC provider failover and circuit breaker logic, please see the [Technical Design Document](./docs/technical-design.md).

## CI/CD Pipeline

Expand Down Expand Up @@ -128,7 +132,6 @@ bandit -r src/
## TODO List (only outstanding TODOs)

### 1. Testing
- [ ] Create integration tests for the entire pipeline
- [ ] Security review of code and dependencies

### 2. Documentation
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ services:
memory: 512M

# Restart policy
restart: unless-stopped
restart: "on-failure"

# Healthcheck to ensure the container is running
healthcheck:
Expand Down
56 changes: 56 additions & 0 deletions docs/technical-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Technical Design & Architecture

This document outlines key architectural decisions and data flows within the Service Quality Oracle.

## RPC Provider Failover and Circuit Breaker Logic

The application is designed to be resilient to transient network issues and RPC provider failures. It uses a multi-layered approach involving internal retries, provider rotation, and an application-level circuit breaker to prevent catastrophic failures and infinite restart loops.

The following diagram illustrates the sequence of events when all RPC providers fail, leading to a single recorded failure by the circuit breaker.

```mermaid
sequenceDiagram
# Setup column titles
participant main_oracle as service_quality_oracle.py
participant blockchain_client as blockchain_client.py
participant circuit_breaker as circuit_breaker.py
participant slack_notifier as slack_notifier.py

# Attempt function call
main_oracle->>blockchain_client: batch_allow_indexers_issuance_eligibility()

# Describe failure loop inside the blockchain_client module
activate blockchain_client
alt RPC Loop (for each provider)

# Attempt RPC call
blockchain_client->>blockchain_client: _execute_rpc_call() with provider A
note right of blockchain_client: Fails after 5 retries

# Log failure
blockchain_client-->>blockchain_client: raises ConnectionError
note right of blockchain_client: Catches error, logs rotation

# Retry RPC call
blockchain_client->>blockchain_client: _execute_rpc_call() with provider B
note right of blockchain_client: Fails after 5 retries

# Log final failure
blockchain_client-->>blockchain_client: raises ConnectionError
note right of blockchain_client: All providers tried and failed
end

# Raise error back to main_oracle oracle and exit blockchain_client module
blockchain_client-->>main_oracle: raises Final ConnectionError
deactivate blockchain_client

# Take note of the failure in the circuit breaker, which can break the restart loop if triggered enough times in a short duration
main_oracle->>circuit_breaker: record_failure()

# Notify of the RPC failure in slack
main_oracle->>slack_notifier: send_failure_notification()

# Document restart process
note right of main_oracle: sys.exit(1)
note right of main_oracle: Docker will restart. CircuitBreaker can halt via sys.exit(0)
```
31 changes: 15 additions & 16 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,43 +2,42 @@
# Requires Python 3.9+

# Configuration management
tomli==2.2.1 # TOML support for Python < 3.11
tomli==2.2.1

# Scheduling and resilience
schedule==1.2.2
pytz==2025.2
tenacity==8.5.0

# Google Cloud BigQuery for data processing
google-cloud-bigquery==3.26.0
bigframes==1.42.0
google-cloud-bigquery==3.34.0
bigframes==2.8.0

# Data processing and validation
pandas==2.2.3
pandera==0.20.4
numpy>=2.0.0 # Added as pandas dependency
numpy==2.3.1

# Blockchain integration - Latest compatible versions
web3==7.12.0
eth-account>=0.13.0
eth-typing>=5.2.0
eth-account==0.13.7
eth-typing==5.2.1

# GraphQL and subgraph integration (for future subgraph functionality)
gql==3.5.2

# HTTP and API
requests==2.32.3
aiohttp>=3.9.0 # For async HTTP requests (used by web3)
aiohttp==3.12.13

# Development/Testing
pytest>=8.0.0
pytest-cov>=6.0.0
pytest-mock>=3.0.0
pytest-snapshot>=0.9.0
mypy>=1.0.0
types-pytz # Type stubs for pytz
types-requests # Type stubs for requests
pytest==8.4.1
pytest-cov==6.2.1
pytest-mock==3.14.1
pytest-snapshot==0.9.0
mypy==1.16.1
types-pytz==2025.2.0.20250516
types-requests==2.32.4.20250611

# Linting and formatting
ruff>=0.6.0
pip==25.1
ruff==0.12.0
Loading