This project delivers a full-stack MLOps solution on Google Cloud Platform (GCP) for baby weight prediction, leveraging Vertex AI, Kubeflow Pipelines, and a comprehensive CI/CD framework. The implementation demonstrates enterprise-grade machine learning operations with robust governance, intelligent deployment strategies, and operational excellence.
The solution addresses critical challenges in healthcare prediction by leveraging:
- Natality Dataset: A comprehensive public dataset from BigQuery containing US birth information from 1969-2008
- Dual Modeling Approach: Parallel BQML and AutoML training for framework diversity and performance optimization
- Intelligent Deployment: Automated model selection, endpoint management, and traffic control
- Interactive Predictions: A modern Streamlit web application with real-time feedback
- End-to-End Automation: Complete CI/CD pipeline for reliable deployments
The application is deployed and accessible at the following URLs:
- Development Environment: https://baby-weight-predictor-dev-526740145114.us-central1.run.app/
- Production Environment: https://baby-weight-predictor-526740145114.us-central1.run.app/
These URLs remain constant even as new versions are deployed through the CI/CD pipeline.
The end-to-end MLOps pipeline orchestrates the entire machine learning lifecycle:
The project includes a fully automated CI/CD pipeline using:
- Infrastructure as Code: Terraform configuration for all GCP resources including IAM, Artifact Registry, and Cloud Run
- Containerization: Docker images with optimized caching stored in Artifact Registry
- Continuous Integration: Cloud Build triggers on GitHub repository changes
- Continuous Deployment: Zero-downtime deployment to Cloud Run
- Environment Separation: Distinct pipelines for development and production environments
- IAM Security: Least-privilege service account permissions for secure deployments
- Monitoring & Observability: Real-time dashboards and alerts for application performance
The CI/CD workflow automatically builds and deploys the Streamlit application when changes are pushed to the repository, ensuring consistent and reliable deployments with full visibility into system health.
- Parallel Processing: Simultaneous training of BigQuery ML and AutoML models
- Framework Diversity: Reduces model failure risk through diverse approaches
- Standardized Metrics: Common evaluation framework for fair comparison
- Automated Comparison: Configurable metrics-based selection (MAE, RMSE, R²)
- Threshold-Based Deployment: Models deploy only when meeting quality thresholds
- Comprehensive Logging: Full transparency for model selection decisions
- Endpoint Detection: Checks for existing endpoints before creating new ones
- Resource Conservation: Prevents endpoint proliferation in production
- Simplified Operations: Reduces maintenance overhead for DevOps teams
- Version Control: Complete model lineage with metadata tracking
- Governance Support: Compliance documentation for regulatory requirements
- Deployment History: Audit trail of all model deployments
- Gradual Rollout: Controlled traffic shifting to new model versions
- Blue/Green Deployment: Support for zero-downtime deployment strategies
- Rollback Capability: Quick recovery options if issues are detected
- Efficient Caching: Optimized resource usage through Vertex Pipeline caching
- Comprehensive Error Handling: Robust exception management at each stage
- Detailed Logging: Complete observability throughout the pipeline
- Real-time Dashboards: Custom Cloud Monitoring dashboards for service health
- Performance Metrics: Tracking of request counts, error rates, and latency
- Alerting System: Proactive notification for service degradation
- Resource Utilization: Monitoring of compute and memory usage
The pipeline orchestrates these key stages:
- Data Engineering: Extract source data from BigQuery and prepare for modeling
- Parallel Model Development: Train both BQML and AutoML models concurrently
- Standardized Evaluation: Apply consistent metrics across both model types
- Performance Analysis: Select optimal model based on configurable criteria
- Model Registration: Store model artifacts with complete metadata
- Deployment Orchestration: Manage endpoints and model serving
- Traffic Control: Configure traffic allocation for production models
- Continuous Delivery: Automate deployment via GitHub-triggered Cloud Build
- Proactive Monitoring: Track service health with custom dashboards
This MLOps solution delivers significant advantages for healthcare organizations:
- Reduced TCO: Optimized resource usage through intelligent endpoint management
- Accelerated Innovation: Faster model iterations with parallel training and caching
- Risk Mitigation: Enhanced reliability through framework diversity and traffic control
- Compliance Support: Comprehensive model registry and versioning for regulatory needs
- Operational Simplification: Automated deployment decisions and endpoint management
- Quality Assurance: Threshold-based deployment ensures performance standards
- System Reliability: Proactive monitoring and alerting minimizes downtime
For detailed component information, refer to the following documentation: