Skip to content

End-to-end ML pipeline for bike sharing demand prediction with MLflow, Prefect, Docker, FastAPI, Terraform, and Evidently AI. Includes automated training, deployment, monitoring, and CI/CD on AWS.

Notifications You must be signed in to change notification settings

aphdinh/Bike-Sharing-Prediction-MLPipeline

Repository files navigation

Seoul Bike Sharing Prediction - MLOps Zoomcamp Capstone Project

A complete end-to-end machine learning project for predicting bike rental demand in Seoul, implementing industry-standard MLOps practices with cloud deployment and monitoring.

πŸ“‹ About

This project demonstrates a production-ready MLOps pipeline for predicting Seoul bike sharing demand.

Key Features:

  • πŸš€ End-to-end ML pipeline with automated training and deployment
  • πŸ“Š Real-time monitoring with data drift detection and performance tracking
  • ☁️ Cloud-native with AWS infrastructure and Terraform IaC
  • πŸ”„ CI/CD pipeline with GitHub Actions for automated testing and deployment
  • πŸ“ˆ Model performance achieving 85-94% RΒ² score with LightGBM/XGBoost

🎯 Problem Statement

Predict the number of bikes rented in Seoul based on weather conditions, time features, and seasonal patterns. This project demonstrates a production-ready ML pipeline with automated training, deployment, monitoring, and retraining capabilities.

πŸ“Š Data Source

This project uses the Seoul Bike Sharing Demand dataset from the UCI Machine Learning Repository.

Dataset Characteristics:

  • Size: 8,760 instances (1 year of hourly data)
  • Target: Rented Bike Count (integer)
  • License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Citation: Seoul Bike Sharing Demand [Dataset]. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C5F62R.

⚑️ Architecture & Tech Stack

Cloud Infrastructure (AWS)

  • S3: Artifact storage and model versioning
  • EC2: Training and deployment instances
  • Terraform: Infrastructure as Code (IaC)

MLOps Tools

  • MLflow: Experiment tracking and model registry
  • Prefect: Workflow orchestration and scheduling
  • Evidently: Model monitoring and data drift detection
  • FastAPI: Model serving and REST API

Development & Deployment

  • Docker: Containerization
  • GitHub Actions: CI/CD pipeline
  • Pytest: Comprehensive testing framework
  • Makefile: Build automation and task management

πŸš€ Key Features

Experiment Tracking & Model Registry

  • Complete MLflow integration for experiment tracking
  • Automated model registration with production staging
  • Model versioning and artifact management

Workflow Orchestration

  • Prefect-based pipeline with task dependencies
  • Automated retraining schedules
  • Parallel model training and evaluation

Model Deployment

  • FastAPI web service with batch prediction
  • Docker containerization
  • Health checks and monitoring endpoints

Comprehensive Monitoring

  • Data quality assessment and data drift detection with Evidently
  • Model performance monitoring

πŸ“Š Model Performance

The pipeline trains and evaluates 12+ ML algorithms:

  • Best Model: LightGBM/XGBoost
  • RΒ² Score: 0.85-0.94
  • RMSE: 200-400 bikes

πŸ› οΈ Installation & Setup

Prerequisites

  • Python 3.8+
  • AWS CLI configured
  • Terraform (for cloud deployment)

1. Clone and Setup Environment

# Clone the repository
git clone <repository-url>
cd mlops

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configure Environment Variables

# Set up environment variables
export AWS_REGION="eu-north-1"
export S3_BUCKET_NAME="seoul-bike-sharing-aphdinh"
export MLFLOW_TRACKING_URI="sqlite:///mlflow.db"
export MLFLOW_ARTIFACT_URI="s3://seoul-bike-sharing-aphdinh/mlflow-artifacts/"

# Or use the provided script
chmod +x scripts/setup/server-start.sh
./scripts/setup/server-start.sh

3. Deploy AWS Infrastructure (Optional)

# Deploy AWS resources
cd terraform
terraform init
terraform plan
terraform apply
cd ..

πŸš€ Running the Application

Quick Start with Makefile

# Install dependencies
make install

# Setup environment
make setup

# Train models
make train

# Start API server
make api

# Run tests
make test

Manual Commands

# Option 1: Run with Prefect orchestration
python src/training/train.py prefect

# Option 2: Run core training pipeline
python src/training/train.py core

# Option 3: Create Prefect deployment
python src/training/train.py deploy

# Start FastAPI service
uvicorn src.api.app:app --host 0.0.0.0 --port 8000 --reload

3. Access Services

πŸ“ Project Structure

mlops/
β”œβ”€β”€ src/                      # Source code
β”‚   β”œβ”€β”€ api/                  # FastAPI application
β”‚   β”œβ”€β”€ training/             # Training pipeline
β”‚   β”œβ”€β”€ monitoring/           # Model monitoring
β”‚   β”œβ”€β”€ models/               # Model definitions
β”‚   β”œβ”€β”€ data/                 # Data processing
β”‚   └── utils/                # Utility functions
β”œβ”€β”€ tests/                    # Test suite
β”œβ”€β”€ config/                   # Configuration files
β”œβ”€β”€ data/                     # Training and monitoring data
β”œβ”€β”€ notebook/                 # Jupyter notebooks (EDA & Models)
β”œβ”€β”€ terraform/                # Infrastructure as Code
β”œβ”€β”€ artifacts/                # Generated models and reports
β”œβ”€β”€ requirements.txt          # Dependencies
β”œβ”€β”€ Makefile                  # Build automation
└── README.md                 # This file

πŸ“ˆ Monitoring & Testing

Test Monitoring System

# Test monitoring functionality
python src/monitoring/test_monitoring.py

# Run integration example
python src/monitoring/integration_example.py

# Run all tests
make test

API Testing

# Test prediction endpoint
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "date": "01/01/2024",
    "hour": 12,
    "temperature_c": 25.0,
    "humidity": 60.0,
    "wind_speed": 2.0,
    "visibility_10m": 2000.0,
    "dew_point_c": 15.0,
    "solar_radiation": 0.5,
    "rainfall_mm": 0.0,
    "snowfall_cm": 0.0,
    "season": "Spring",
    "holiday": "No Holiday",
    "functioning_day": "Yes"
  }'

Monitoring Endpoints

# Check monitoring status
curl http://localhost:8000/monitoring/status

# Generate data drift report
curl -X POST http://localhost:8000/monitoring/data-drift

# Generate data quality report
curl -X POST http://localhost:8000/monitoring/data-quality

πŸ”§ Configuration

Environment Variables

Variable Default Description
AWS_REGION eu-north-1 AWS region
S3_BUCKET_NAME seoul-bike-sharing-aphdinh S3 bucket for artifacts
MLFLOW_TRACKING_URI sqlite:///mlflow.db MLflow tracking URI
MLFLOW_ARTIFACT_URI s3://... S3 artifact storage

S3 Bucket Structure

s3://seoul-bike-sharing-aphdinh/
β”œβ”€β”€ data/                    # Training data
β”œβ”€β”€ models/                  # Trained models
β”œβ”€β”€ artifacts/              # MLflow artifacts
└── reports/               # Monitoring reports

πŸ“Š Results & Artifacts

MLflow Experiments

  • Complete experiment history
  • Model comparison metrics
  • Hyperparameter optimization results
  • Artifact versioning

S3 Artifacts

  • Trained models with metadata
  • Feature importance plots
  • Training reports and summaries
  • Monitoring dashboards

🎯 Requirements

Requirement Status Implementation
Problem Description βœ… Complete Seoul bike sharing prediction with clear objectives
Cloud Integration βœ… Complete AWS S3, EC2, Terraform IaC
Experiment Tracking βœ… Complete MLflow with model registry
Workflow Orchestration βœ… Complete Prefect with scheduling
Model Deployment βœ… Complete FastAPI + Docker
Model Monitoring βœ… Complete Evidently with alerts
Reproducibility βœ… Complete Requirements.txt + setup instructions
Best Practices βœ… Complete Tests, CI/CD, documentation, Makefile

πŸš€ Deployment Options

Local Development

make dev  # Full development setup

Cloud Deployment

make deploy-infra  # Deploy infrastructure
python src/training/train.py prefect  # Run training on cloud

About

End-to-end ML pipeline for bike sharing demand prediction with MLflow, Prefect, Docker, FastAPI, Terraform, and Evidently AI. Includes automated training, deployment, monitoring, and CI/CD on AWS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published