A complete end-to-end machine learning project for predicting bike rental demand in Seoul, implementing industry-standard MLOps practices with cloud deployment and monitoring.
This project demonstrates a production-ready MLOps pipeline for predicting Seoul bike sharing demand.
Key Features:
- π End-to-end ML pipeline with automated training and deployment
- π Real-time monitoring with data drift detection and performance tracking
- βοΈ Cloud-native with AWS infrastructure and Terraform IaC
- π CI/CD pipeline with GitHub Actions for automated testing and deployment
- π Model performance achieving 85-94% RΒ² score with LightGBM/XGBoost
Predict the number of bikes rented in Seoul based on weather conditions, time features, and seasonal patterns. This project demonstrates a production-ready ML pipeline with automated training, deployment, monitoring, and retraining capabilities.
This project uses the Seoul Bike Sharing Demand dataset from the UCI Machine Learning Repository.
Dataset Characteristics:
- Size: 8,760 instances (1 year of hourly data)
- Target: Rented Bike Count (integer)
- License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Citation: Seoul Bike Sharing Demand [Dataset]. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C5F62R.
- S3: Artifact storage and model versioning
- EC2: Training and deployment instances
- Terraform: Infrastructure as Code (IaC)
- MLflow: Experiment tracking and model registry
- Prefect: Workflow orchestration and scheduling
- Evidently: Model monitoring and data drift detection
- FastAPI: Model serving and REST API
- Docker: Containerization
- GitHub Actions: CI/CD pipeline
- Pytest: Comprehensive testing framework
- Makefile: Build automation and task management
- Complete MLflow integration for experiment tracking
- Automated model registration with production staging
- Model versioning and artifact management
- Prefect-based pipeline with task dependencies
- Automated retraining schedules
- Parallel model training and evaluation
- FastAPI web service with batch prediction
- Docker containerization
- Health checks and monitoring endpoints
- Data quality assessment and data drift detection with Evidently
- Model performance monitoring
The pipeline trains and evaluates 12+ ML algorithms:
- Best Model: LightGBM/XGBoost
- RΒ² Score: 0.85-0.94
- RMSE: 200-400 bikes
- Python 3.8+
- AWS CLI configured
- Terraform (for cloud deployment)
# Clone the repository
git clone <repository-url>
cd mlops
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Set up environment variables
export AWS_REGION="eu-north-1"
export S3_BUCKET_NAME="seoul-bike-sharing-aphdinh"
export MLFLOW_TRACKING_URI="sqlite:///mlflow.db"
export MLFLOW_ARTIFACT_URI="s3://seoul-bike-sharing-aphdinh/mlflow-artifacts/"
# Or use the provided script
chmod +x scripts/setup/server-start.sh
./scripts/setup/server-start.sh# Deploy AWS resources
cd terraform
terraform init
terraform plan
terraform apply
cd ..# Install dependencies
make install
# Setup environment
make setup
# Train models
make train
# Start API server
make api
# Run tests
make test# Option 1: Run with Prefect orchestration
python src/training/train.py prefect
# Option 2: Run core training pipeline
python src/training/train.py core
# Option 3: Create Prefect deployment
python src/training/train.py deploy
# Start FastAPI service
uvicorn src.api.app:app --host 0.0.0.0 --port 8000 --reload- API Health Check: http://localhost:8000/health
- MLflow UI: http://localhost:5000 (if running)
mlops/
βββ src/ # Source code
β βββ api/ # FastAPI application
β βββ training/ # Training pipeline
β βββ monitoring/ # Model monitoring
β βββ models/ # Model definitions
β βββ data/ # Data processing
β βββ utils/ # Utility functions
βββ tests/ # Test suite
βββ config/ # Configuration files
βββ data/ # Training and monitoring data
βββ notebook/ # Jupyter notebooks (EDA & Models)
βββ terraform/ # Infrastructure as Code
βββ artifacts/ # Generated models and reports
βββ requirements.txt # Dependencies
βββ Makefile # Build automation
βββ README.md # This file
# Test monitoring functionality
python src/monitoring/test_monitoring.py
# Run integration example
python src/monitoring/integration_example.py
# Run all tests
make test# Test prediction endpoint
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"date": "01/01/2024",
"hour": 12,
"temperature_c": 25.0,
"humidity": 60.0,
"wind_speed": 2.0,
"visibility_10m": 2000.0,
"dew_point_c": 15.0,
"solar_radiation": 0.5,
"rainfall_mm": 0.0,
"snowfall_cm": 0.0,
"season": "Spring",
"holiday": "No Holiday",
"functioning_day": "Yes"
}'# Check monitoring status
curl http://localhost:8000/monitoring/status
# Generate data drift report
curl -X POST http://localhost:8000/monitoring/data-drift
# Generate data quality report
curl -X POST http://localhost:8000/monitoring/data-quality| Variable | Default | Description |
|---|---|---|
AWS_REGION |
eu-north-1 |
AWS region |
S3_BUCKET_NAME |
seoul-bike-sharing-aphdinh |
S3 bucket for artifacts |
MLFLOW_TRACKING_URI |
sqlite:///mlflow.db |
MLflow tracking URI |
MLFLOW_ARTIFACT_URI |
s3://... |
S3 artifact storage |
s3://seoul-bike-sharing-aphdinh/
βββ data/ # Training data
βββ models/ # Trained models
βββ artifacts/ # MLflow artifacts
βββ reports/ # Monitoring reports
- Complete experiment history
- Model comparison metrics
- Hyperparameter optimization results
- Artifact versioning
- Trained models with metadata
- Feature importance plots
- Training reports and summaries
- Monitoring dashboards
| Requirement | Status | Implementation |
|---|---|---|
| Problem Description | β Complete | Seoul bike sharing prediction with clear objectives |
| Cloud Integration | β Complete | AWS S3, EC2, Terraform IaC |
| Experiment Tracking | β Complete | MLflow with model registry |
| Workflow Orchestration | β Complete | Prefect with scheduling |
| Model Deployment | β Complete | FastAPI + Docker |
| Model Monitoring | β Complete | Evidently with alerts |
| Reproducibility | β Complete | Requirements.txt + setup instructions |
| Best Practices | β Complete | Tests, CI/CD, documentation, Makefile |
make dev # Full development setupmake deploy-infra # Deploy infrastructure
python src/training/train.py prefect # Run training on cloud