Welcome to the Vehicle Detection MLOps Project, a full-stack, end-to-end machine learning project architected with MLOps best practices. This project showcases the integration of robust data pipelines, MongoDB, AWS services, CI/CD, Docker, and deployment on EC2, all production-ready and optimized for scaling.
vehicle-mlops-project/
│
├── src/ # Core ML codebase
├── notebook/ # EDA and MongoDB notebooks
├── templates/ # Frontend HTML files
├── static/ # CSS, JS files for UI
├── .github/workflows/ # GitHub Actions for CI/CD
├── setup.py # Local package setup
├── pyproject.toml # Build system config
├── requirements.txt # Python dependencies
├── dockerfile # Docker image build file
├── app.py # FastAPI/Flask app entrypoint
├── README.md # You’re here!
└── ...
-
Create Project Template:
Runtemplate.py
to auto-generate the folder structure. -
Package Configuration:
setup.py
andpyproject.toml
enable importing local modules across components. -
Environment Setup:
conda create -n vehicle python=3.10 -y conda activate vehicle pip install -r requirements.txt pip list # Verify local packages
-
Set up MongoDB Atlas, create cluster & user, and whitelist IP
0.0.0.0/0
. -
Obtain your connection string and set environment variable:
export MONGODB_URL="mongodb+srv://<username>:<password>@cluster.mongodb.net"
-
Ingest data from notebook into MongoDB Atlas and verify from MongoDB UI.
Each module follows a clean separation of concerns and is fully testable.
-
Logging & Exception Handling
- Centralized logging with
logger.py
and custom error classes inexception.py
.
- Centralized logging with
-
Data Ingestion
- Pulls structured data from MongoDB and stores it as a DataFrame.
- Defined via
DataIngestionConfig
andDataIngestionArtifact
.
-
Data Validation
- Utilizes schema from
config/schema.yaml
for robust checks.
- Utilizes schema from
-
Data Transformation
- Feature engineering and transformation pipelines using custom estimators.
-
Model Training
- Encapsulated in
ModelTrainerConfig
andModelTrainerArtifact
.
- Encapsulated in
-
IAM Setup
Created users withAdministratorAccess
, managed via secure environment variables. -
S3 Buckets
Used to store model artifacts:MODEL_BUCKET_NAME = "my-model-mlopsproj"
MODEL_PUSHER_S3_KEY = "model-registry"
-
AWS S3 Access
Fully integrated usingboto3
inaws_connection.py
ands3_estimator.py
.
- Model evaluation checks if performance improvement >
0.02
- Model is pushed to S3 if evaluation passes
- Supports retraining and pushing updated versions
app.py
serves as the main API using Flask/FastAPI- Routes:
/
– UI to upload image/data/predict
– Returns prediction/train
– Triggers model training
- Dockerized Project
- Self-hosted GitHub Runner on EC2
- CI/CD Pipeline
- On every commit: Docker image is built → pushed to ECR → deployed to EC2
Key | Purpose |
---|---|
AWS_ACCESS_KEY_ID |
IAM user key |
AWS_SECRET_ACCESS_KEY |
IAM secret |
AWS_DEFAULT_REGION |
AWS region (us-east-1) |
ECR_REPO |
ECR repo URI |
-
Instance Setup
- Ubuntu 24.04 | T2 Medium | 30 GB storage
- Docker installed
- Port 5080 opened for traffic
-
App Access
Visit:http://<EC2_PUBLIC_IP>:5080
Category | Stack |
---|---|
Language | Python |
ML Framework | Scikit-learn |
DB | MongoDB Atlas |
Cloud | AWS (S3, IAM, EC2, ECR) |
Orchestration | GitHub Actions |
Containerization | Docker |
Deployment | EC2 |
Web | Flask/FastAPI |
DevOps | CI/CD, Self-hosted runner |
- Model versioning with MLflow
- Automated monitoring with Prometheus + Grafana
- Drift detection and scheduled retraining
Pull requests are welcome. For major changes, please open an issue first.