An end-to-end MLOps pipeline for vehicle data:
automated data ingestion from MongoDB, EDA & feature engineering, model training & evaluation, AWS S3 storage, and seamless CI/CD deployment to an EC2-hosted web app.
- Features
- Tech Stack
- Prerequisites
- Local Setup
- MongoDB Atlas Setup
- Logging, Exceptions & Notebooks
- Data Ingestion
- Data Validation, Transformation & Training
- AWS & Model Evaluation
- Prediction Pipeline & Web App
- CI/CD & Deployment
- Project Structure
- Contributing
- License
- Automated Project Template
- Local package management via
setup.py
&pyproject.toml
- Conda-based virtual environment
- MongoDB Atlas ingestion & management
- Structured logging & custom exception handling
- Exploratory Data Analysis & Feature Engineering notebooks
- Data validation, transformation, model training & evaluation
- AWS S3 model registry & Model Pusher
- Dockerized microservice & GitHub Actions CI/CD
- Self-hosted runner on AWS EC2 for continuous deployment
Layer | Tools & Libraries |
---|---|
Language | Python 3.10 |
Environment | Conda, virtualenv |
Data Store | MongoDB Atlas |
Logging & Exceptions | logging , custom exception classes |
Data Science | Pandas, NumPy, Scikit-learn |
Cloud | AWS S3, IAM, EC2 |
CI/CD | GitHub Actions, Docker, AWS ECR |
Web Framework | Flask (via app.py ), Jinja2 templates |
- Conda
- Python 3.10
- MongoDB Atlas account
- AWS account (us-east-1)
- Git & GitHub repository
-
Generate project scaffold
python template.py
-
Configure packaging
- Edit
setup.py
&pyproject.toml
to import local packages. - See crashcourse.txt for details.
- Edit
-
Create & activate Conda env
conda create -n vehicle python=3.10 -y conda activate vehicle
-
Add dependencies to
requirements.txt
, then install:pip install -r requirements.txt pip list # verify local packages
- Sign up on MongoDB Atlas & create a Project.
- Create a Cluster: choose M0 (Free), defaults → Deploy.
- Configure Database User with username/password.
- In Network Access, allow
0.0.0.0/0
. - Get Connection String → Drivers → Python 3.6+ → copy & replace
<password>
. - Inside
notebook/
, createmongoDB_demo.ipynb
(kernel:vehicle
). - Add your dataset to
notebook/
and push to MongoDB from notebook. - Verify data in Atlas UI under Browse Collections.
- Implement
logger.py
; test viademo.py
. - Implement
exception.py
; test viademo.py
. - Add EDA & Feature Engineering notebooks under
notebook/
.
- Define constants in
src/constants/__init__.py
. - Configure MongoDB connection in
src/configuration/mongo_db_connections.py
. - In
src/data_access/proj1_data.py
: fetch & transform MongoDB data to a DataFrame. - Define
DataIngestionConfig
&DataIngestionArtifact
insrc/entity/
. - Implement ingestion logic in
src/components/data_ingestion.py
. - Hook into the training pipeline; run
demo.py
(ensureMONGODB_URL
env var set).
# Bash
export MONGODB_URL="mongodb+srv://<username>:<password>@cluster0.mongodb.net/<db>?retryWrites=true&w=majority"
echo $MONGODB_URL
# PowerShell
$env:MONGODB_URL = "mongodb+srv://<username>:<password>…"
echo $env:MONGODB_URL
- Fill
src/utils/main_utils.py
& updateconfig/schema.yaml
for validation rules. - Build Data Validation component (mirror ingestion workflow).
- Build Data Transformation (add
estimator.py
entity). - Build Model Trainer (extend
estimator.py
with training classes).
-
In AWS console (region us-east-1):
- Create IAM user
firstproj
w/ AdministratorAccess → CSV access keys.
- Create IAM user
-
Set AWS env vars:
export AWS_ACCESS_KEY_ID="…" export AWS_SECRET_ACCESS_KEY="…"
-
Add to
src/constants/__init__.py
:MODEL_EVALUATION_CHANGED_THRESHOLD_SCORE: float = 0.02 MODEL_BUCKET_NAME = "my-model-mlopsproj" MODEL_PUSHER_S3_KEY = "model-registry"
-
Create S3 bucket
my-model-mlopsproj
(us-east-1, public access off). -
Implement S3 connectors in
src/configuration/aws_connection.py
&src/aws_storage/s3_estimator.py
. -
Develop Model Evaluation & Model Pusher modules.
- Scaffold prediction pipeline &
app.py
(Flask). - Add
static/
andtemplate/
directories for front-end assets and HTML.
-
Write
Dockerfile
&.dockerignore
. -
Configure GitHub Actions in
.github/workflows/aws.yaml
. -
Create IAM user
usvisa-user
for ECR & deployment. -
Create ECR repository
vehicleproj
; note its URI. -
Launch EC2 instance (Ubuntu 24.04, t2.medium) with security group open on 5080.
-
Install Docker on EC2:
sudo apt-get update -y && sudo apt-get upgrade -y curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh sudo usermod -aG docker ubuntu && newgrp docker
-
Configure self-hosted runner on EC2 for GitHub Actions.
-
Add GitHub Secrets under Settings → Secrets:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
ECR_REPO
-
On push to
main
, CI/CD builds Docker image, pushes to ECR, and deploys to EC2. -
Access your app via
http://<EC2_PUBLIC_IP>:5080
. -
Training endpoint available at
/training
.
vehicle-mlops/
├── notebook/
│ ├── mongoDB_demo.ipynb
│ └── eda_feature_engg.ipynb
├── requirements.txt
├── setup.py
├── pyproject.toml
├── template.py
├── src/
│ ├── constants/
│ │ └── __init__.py
│ ├── configuration/
│ │ ├── mongo_db_connections.py
│ │ └── aws_connection.py
│ ├── data_access/
│ ├── components/
│ ├── entity/
│ ├── utils/
│ └── aws_storage/
├── demo.py
├── app.py
├── Dockerfile
├── .github/
│ └── workflows/aws.yaml
└── .gitignore
- Fork the repo & create a feature branch
- Make your changes & add tests
- Submit a Pull Request
Please adhere to the Code of Conduct.
This project is licensed under the MIT License. See LICENSE for details.