MLOps Backend Service

A comprehensive backend service for managing user authentication, organization membership, cluster resource allocation, and deployment scheduling with priority-based preemptive scheduling.

Features

User Authentication: JWT-based authentication with bcrypt password hashing
Organization Management: Invite code-based organization membership
Cluster Management: Create and manage clusters with AWS-style resource units
Deployment Management: Docker-based deployment management with resource allocation
Priority-based Scheduling: HIGH/MEDIUM/LOW priority with preemptive scheduling
Resource Optimization: Efficient resource utilization and bin-packing algorithms
Queue Management: Redis-based persistent deployment queues

Technology Stack

Framework: FastAPI with async support
Database: PostgreSQL with SQLAlchemy ORM
Queue: Redis for deployment scheduling
Authentication: JWT tokens with bcrypt
Testing: pytest with async support

Installation

Prerequisites

Python 3.8+
PostgreSQL
Redis Server

Setup

Clone the repository:

git clone <repository-url>
cd mlops_backend

Install dependencies:
```
pip install -r requirements.txt
```

Setup PostgreSQL:

# Create database
createdb mlops_db

# Update connection string in app/config.py or set environment variable
export DATABASE_URL="postgresql://username:password@localhost/mlops_db"

or 

export DATABASE_URL="postgresql://localhost/mlops_db"

Setup Redis:

# Start Redis server
redis-server

# Or using Docker
docker run -d -p 6379:6379 redis:alpine

Configure environment variables (optional): Create a .env file:

DATABASE_URL=postgresql://localhost/mlops_db
REDIS_URL=redis://localhost:6379/0
SECRET_KEY=your-secret-key-here

Running the Application

Development Mode

# Run with auto-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Production Mode

# Run with gunicorn
gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

The API will be available at http://localhost:8000

API Documentation

Once the server is running, visit:

Interactive API docs: http://localhost:8000/docs
ReDoc documentation: http://localhost:8000/redoc

API Endpoints

Authentication

POST /auth/register - Register new user
POST /auth/login - Login user
POST /auth/join-organization - Join organization with invite code
GET /auth/me - Get current user info

Organizations

POST /organizations/ - Create organization
GET /organizations/me - Get user's organization

Clusters

POST /clusters/ - Create cluster
GET /clusters/ - List clusters
GET /clusters/{id} - Get cluster details
PUT /clusters/{id} - Update cluster
DELETE /clusters/{id} - Delete cluster
GET /clusters/{id}/resources - Get resource usage

Deployments

POST /deployments/ - Create deployment
GET /deployments/ - List deployments
GET /deployments/{id} - Get deployment details
PUT /deployments/{id} - Update deployment priority
DELETE /deployments/{id} - Cancel deployment
POST /deployments/{id}/start - Start deployment (simulation)
POST /deployments/{id}/complete - Complete deployment (simulation)
GET /deployments/queue/{cluster_id} - Get deployment queue
POST /deployments/queue/{cluster_id}/process - Process queue

Resource Units (AWS-style)

RAM: Gigabytes (GB) - e.g., 1, 2, 4, 8, 16, 32
CPU: vCPUs - e.g., 1, 2, 4, 8, 16
GPU: Count - e.g., 0, 1, 2, 4, 8

Priority Levels

HIGH (1): Highest priority, can preempt lower priority deployments
MEDIUM (2): Standard priority
LOW (3): Lowest priority, can be preempted

Scheduling Algorithm

The service implements a Priority-based Preemptive Scheduler:

Immediate Scheduling: If resources are available, deploy immediately
Preemption: HIGH priority deployments can preempt MEDIUM/LOW priority ones
Queueing: Deployments wait in Redis queue when resources unavailable
Resource Optimization: Minimal preemption set using bin-packing approach
Automatic Requeuing: Preempted deployments are automatically requeued

Testing

Run the test suite:

# Run all tests
pytest

# Run specific test file
pytest tests/test_auth.py

# Run with coverage
pytest --cov=app tests/

Usage Examples

1. Register and Setup Organization

# Register user
curl -X POST "http://localhost:8000/auth/register" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "admin",
    "email": "admin@company.com",
    "password": "securepassword"
  }'

# Login
curl -X POST "http://localhost:8000/auth/login" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "admin",
    "password": "securepassword"
  }'

# Create organization
curl -X POST "http://localhost:8000/organizations/" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Company"
  }'

2. Create Cluster

curl -X POST "http://localhost:8000/clusters/" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production Cluster",
    "total_ram_gb": 64.0,
    "total_cpu_vcpus": 16.0,
    "total_gpu_count": 4
  }'

3. Create Deployment

curl -X POST "http://localhost:8000/deployments/" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Web App Deployment",
    "cluster_id": 1,
    "docker_image": "nginx:latest",
    "required_ram_gb": 4.0,
    "required_cpu_vcpus": 2.0,
    "required_gpu_count": 0,
    "priority": "HIGH"
  }'

Health Check

curl http://localhost:8000/health

Architecture

The service follows a layered architecture:

API Layer: FastAPI routers handling HTTP requests
Service Layer: Business logic and orchestration
Model Layer: SQLAlchemy ORM models
Database Layer: PostgreSQL for persistent data
Queue Layer: Redis for deployment scheduling

Future Enhancements

The codebase is designed to be extensible for:

RBAC (Role-Based Access Control): User roles are already in the model
Multi-cloud Support: Abstract resource providers
Advanced Scheduling: Machine learning-based resource prediction
Monitoring: Integration with Prometheus/Grafana
Audit Logging: Track all operations for compliance

Contributing

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

License

[Add your license here]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
activity_flow.txt		activity_flow.txt
database_uml.md		database_uml.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLOps Backend Service

Features

Technology Stack

Installation

Prerequisites

Setup

Running the Application

Development Mode

Production Mode

API Documentation

API Endpoints

Authentication

Organizations

Clusters

Deployments

Resource Units (AWS-style)

Priority Levels

Scheduling Algorithm

Testing

Usage Examples

1. Register and Setup Organization

2. Create Cluster

3. Create Deployment

Health Check

Architecture

Future Enhancements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DevAakash17/mlops-backend

Folders and files

Latest commit

History

Repository files navigation

MLOps Backend Service

Features

Technology Stack

Installation

Prerequisites

Setup

Running the Application

Development Mode

Production Mode

API Documentation

API Endpoints

Authentication

Organizations

Clusters

Deployments

Resource Units (AWS-style)

Priority Levels

Scheduling Algorithm

Testing

Usage Examples

1. Register and Setup Organization

2. Create Cluster

3. Create Deployment

Health Check

Architecture

Future Enhancements

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages