Production-Ready ML Pipeline on GCP: Baby Weight Prediction

Overview

This project delivers a full-stack MLOps solution on Google Cloud Platform (GCP) for baby weight prediction, leveraging Vertex AI, Kubeflow Pipelines, and a comprehensive CI/CD framework. The implementation demonstrates enterprise-grade machine learning operations with robust governance, intelligent deployment strategies, and operational excellence.

The solution addresses critical challenges in healthcare prediction by leveraging:

Natality Dataset: A comprehensive public dataset from BigQuery containing US birth information from 1969-2008
Dual Modeling Approach: Parallel BQML and AutoML training for framework diversity and performance optimization
Intelligent Deployment: Automated model selection, endpoint management, and traffic control
Interactive Predictions: A modern Streamlit web application with real-time feedback
End-to-End Automation: Complete CI/CD pipeline for reliable deployments

Live Application

The application is deployed and accessible at the following URLs:

Development Environment: https://baby-weight-predictor-dev-526740145114.us-central1.run.app/
Production Environment: https://baby-weight-predictor-526740145114.us-central1.run.app/

These URLs remain constant even as new versions are deployed through the CI/CD pipeline.

Interactive Prediction Application

Pipeline Architecture

The end-to-end MLOps pipeline orchestrates the entire machine learning lifecycle:

CI/CD Pipeline

The project includes a fully automated CI/CD pipeline using:

Infrastructure as Code: Terraform configuration for all GCP resources including IAM, Artifact Registry, and Cloud Run
Containerization: Docker images with optimized caching stored in Artifact Registry
Continuous Integration: Cloud Build triggers on GitHub repository changes
Continuous Deployment: Zero-downtime deployment to Cloud Run
Environment Separation: Distinct pipelines for development and production environments
IAM Security: Least-privilege service account permissions for secure deployments
Monitoring & Observability: Real-time dashboards and alerts for application performance

The CI/CD workflow automatically builds and deploys the Streamlit application when changes are pushed to the repository, ensuring consistent and reliable deployments with full visibility into system health.

Key Production Features

1. Dual-Model Training & Evaluation

Parallel Processing: Simultaneous training of BigQuery ML and AutoML models
Framework Diversity: Reduces model failure risk through diverse approaches
Standardized Metrics: Common evaluation framework for fair comparison

2. Intelligent Model Selection

Automated Comparison: Configurable metrics-based selection (MAE, RMSE, R²)
Threshold-Based Deployment: Models deploy only when meeting quality thresholds
Comprehensive Logging: Full transparency for model selection decisions

3. Enterprise Endpoint Management

Endpoint Detection: Checks for existing endpoints before creating new ones
Resource Conservation: Prevents endpoint proliferation in production
Simplified Operations: Reduces maintenance overhead for DevOps teams

4. Model Registry Integration

Version Control: Complete model lineage with metadata tracking
Governance Support: Compliance documentation for regulatory requirements
Deployment History: Audit trail of all model deployments

5. Production Traffic Management

Gradual Rollout: Controlled traffic shifting to new model versions
Blue/Green Deployment: Support for zero-downtime deployment strategies
Rollback Capability: Quick recovery options if issues are detected

6. Operational Excellence

Efficient Caching: Optimized resource usage through Vertex Pipeline caching
Comprehensive Error Handling: Robust exception management at each stage
Detailed Logging: Complete observability throughout the pipeline

7. Production Monitoring

Real-time Dashboards: Custom Cloud Monitoring dashboards for service health
Performance Metrics: Tracking of request counts, error rates, and latency
Alerting System: Proactive notification for service degradation
Resource Utilization: Monitoring of compute and memory usage

Technical Implementation

The pipeline orchestrates these key stages:

Data Engineering: Extract source data from BigQuery and prepare for modeling
Parallel Model Development: Train both BQML and AutoML models concurrently
Standardized Evaluation: Apply consistent metrics across both model types
Performance Analysis: Select optimal model based on configurable criteria
Model Registration: Store model artifacts with complete metadata
Deployment Orchestration: Manage endpoints and model serving
Traffic Control: Configure traffic allocation for production models
Continuous Delivery: Automate deployment via GitHub-triggered Cloud Build
Proactive Monitoring: Track service health with custom dashboards

Business Benefits

This MLOps solution delivers significant advantages for healthcare organizations:

Reduced TCO: Optimized resource usage through intelligent endpoint management
Accelerated Innovation: Faster model iterations with parallel training and caching
Risk Mitigation: Enhanced reliability through framework diversity and traffic control
Compliance Support: Comprehensive model registry and versioning for regulatory needs
Operational Simplification: Automated deployment decisions and endpoint management
Quality Assurance: Threshold-based deployment ensures performance standards
System Reliability: Proactive monitoring and alerting minimizes downtime

Documentation

For detailed component information, refer to the following documentation:

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.streamlit		.streamlit
compiled_pipeline_specs		compiled_pipeline_specs
experiment		experiment
img		img
key		key
modules/cloud-build		modules/cloud-build
src		src
temp/02_ml_pipeline		temp/02_ml_pipeline
terraform		terraform
.gitignore		.gitignore
00_ml_pipeline.ipynb		00_ml_pipeline.ipynb
01_online_prediction_plotting.ipynb		01_online_prediction_plotting.ipynb
Dockerfile		Dockerfile
README.md		README.md
build_and_push.sh		build_and_push.sh
check_pipeline_outputs.py		check_pipeline_outputs.py
cicd.md		cicd.md
cicd_log.md		cicd_log.md
cloudbuild-dev.yaml		cloudbuild-dev.yaml
cloudbuild.yaml		cloudbuild.yaml
create_simple_dashboard.sh		create_simple_dashboard.sh
deploy_streamlit_app.sh		deploy_streamlit_app.sh
deployment.env.example		deployment.env.example
env_cicd.txt		env_cicd.txt
env_fixed.txt		env_fixed.txt
env_updated.txt		env_updated.txt
ex_app.py		ex_app.py
github_integration.md		github_integration.md
pipeline.md		pipeline.md
quickstart.md		quickstart.md
requirements.txt		requirements.txt
run_modernized_pipeline.py		run_modernized_pipeline.py
setup_monitoring.sh		setup_monitoring.sh
streamlit_app_dynamic.py		streamlit_app_dynamic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Production-Ready ML Pipeline on GCP: Baby Weight Prediction

Overview

Live Application

Interactive Prediction Application

Pipeline Architecture

CI/CD Pipeline

Key Production Features

1. Dual-Model Training & Evaluation

2. Intelligent Model Selection

3. Enterprise Endpoint Management

4. Model Registry Integration

5. Production Traffic Management

6. Operational Excellence

7. Production Monitoring

Technical Implementation

Business Benefits

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

zacharyvunguyen/Production-Ready-ML-Pipeline-on-GCP-Baby-Weight-Prediction

Folders and files

Latest commit

History

Repository files navigation

Production-Ready ML Pipeline on GCP: Baby Weight Prediction

Overview

Live Application

Interactive Prediction Application

Pipeline Architecture

CI/CD Pipeline

Key Production Features

1. Dual-Model Training & Evaluation

2. Intelligent Model Selection

3. Enterprise Endpoint Management

4. Model Registry Integration

5. Production Traffic Management

6. Operational Excellence

7. Production Monitoring

Technical Implementation

Business Benefits

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages