End-to-End Machine Learning Project

📘 Overview

This repository contains a fully modular, production-ready Machine Learning project built from scratch and deployed on AWS and Azure using Docker containerization.

The project demonstrates the complete lifecycle of an ML application from data ingestion and transformation to model training, Flask API deployment, and cloud deployment following best software engineering practices like logging, custom exception handling, and config-driven modularization.

🧱 Project Architecture

project-root/
│
├── artifacts/ # Stored artifacts and serialized models
│ ├── data.csv
│ ├── train.csv
│ ├── test.csv
│ └── preprocessor.pkl
│
├── data/
│ └── stud.csv # Raw dataset
│
├── notebooks/ # Jupyter notebooks for experiments
│ ├── 01.eda.ipynb
│ └── 02.modeling.ipynb
│
├── src/
│ ├── components/ # Data and model pipeline components
│ │ ├── data_ingestion.py
│ │ ├── data_transformation.py
│ │ └── model_trainer.py
│ │
│ ├── pipeline/ # End-to-end pipeline orchestration
│ │ └── training_pipeline.py
│ │
│ ├── exception.py # Custom exception handler
│ ├── logger.py # Logging configuration
│ ├── utils.py # Helper functions
│ └── init.py
│
├── app.py # Flask API for model inference
├── Dockerfile # Docker container configuration
├── setup.py # Project packaging setup
├── requirements.txt # Python dependencies
├── LICENSE
└── README.md

🧠 Features

🔄 End-to-End ML Pipeline: Data ingestion → transformation → model training → evaluation
🧹 Automated Preprocessing: Handles missing values, encoding, and scaling
🧪 Exploratory Analysis: Jupyter notebooks for EDA and feature engineering
🪵 Logging System: Real-time tracking of all pipeline stages (logger.py)
⚡ Custom Exception Handling: Graceful error capture and debugging via CustomException
🧰 Utility Functions: Common operations abstracted into utils.py
🌐 Flask REST API: Exposes prediction endpoints for integration
🐳 Docker Containerization: Consistent, portable environment for deployment
☁️ Cloud Deployment: Hosted on AWS EC2 and Azure Web App
🧩 Configurable Setup: Uses setup.py and requirements.txt for easy installation

🧩 Key Components

Module	Description
`data_ingestion.py`	Loads and validates data from source, splits into train/test sets
`data_transformation.py`	Cleans data, performs feature engineering, and builds preprocessing pipeline
`model_trainer.py`	Trains, tunes, and evaluates ML models; saves best model to `artifacts/`
`logger.py`	Centralized logging utility for all pipeline steps
`exception.py`	Custom error-handling framework with detailed traceback logging
`utils.py`	Helper utilities (e.g., file handling, model saving/loading)
`app.py`	Flask application for model inference
`Dockerfile`	Defines environment for containerized deployment

🚀 Getting Started

1️⃣ Clone the Repository

git clone https://github.com/<your-username>/<repo-name>.git
cd <repo-name>

2️⃣ Create and Activate a Virtual Environment

python -m venv venv

source venv/bin/activate      # macOS/Linux

venv\Scripts\activate         # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the ML Pipeline

python -m src.pipeline.training_pipeline

5️⃣ Start the Flask API

python app.py

Your API will be available at:

👉 http://127.0.0.1:5000/predict

🐳 Running with Docker

Build Docker Image

docker build -t rentify-ml-app .

Run Docker Container

docker run -p 5000:5000 rentify-ml-app

Access the API at: 👉 http://localhost:5000/predict

☁️ Deployment

AWS EC2

Launch an EC2 instance
Install Docker and pull your image from Docker Hub
Run the container and expose port 5000

Azure Web App

Create a Web App for Containers
Push your Docker image to Azure Container Registry (ACR)
Deploy the container to your Web App

🧾 Logging & Monitoring

Logs are automatically generated and stored in the logs/ directory.
Each major pipeline component logs start, end, and error states.
Critical exceptions are captured by CustomException and written to both console and log files.

🧰 Tech Stack

Category	Tools / Libraries
Language	Python 3.10+
Data Handling	Pandas, NumPy
ML Framework	Scikit-learn
API Framework	Flask
Containerization	Docker
Deployment	AWS EC2, Azure Web App
Utilities	Logging, OS, sys, pickle

📈 Future Enhancements

Model versioning with MLflow
Front-end interface (Streamlit or React)
API authentication and rate limiting

👨‍💻 Author

Derrick Nyongesa
B.Sc. Electrical & Electronics Engineering | Data Scientist
📧 derricknyongesa0.email@gmail.com
🌐 LinkedIn | GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-to-End Machine Learning Project

📘 Overview

🧱 Project Architecture

🧠 Features

🧩 Key Components

🚀 Getting Started

1️⃣ Clone the Repository

2️⃣ Create and Activate a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Run the ML Pipeline

5️⃣ Start the Flask API

🐳 Running with Docker

Run Docker Container

☁️ Deployment

AWS EC2

Azure Web App

🧾 Logging & Monitoring

🧰 Tech Stack

📈 Future Enhancements

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
artifacts		artifacts
catboost_info		catboost_info
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

DECTEN0/end-to-end-ml-app

Folders and files

Latest commit

History

Repository files navigation

End-to-End Machine Learning Project

📘 Overview

🧱 Project Architecture

🧠 Features

🧩 Key Components

🚀 Getting Started

1️⃣ Clone the Repository

2️⃣ Create and Activate a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Run the ML Pipeline

5️⃣ Start the Flask API

🐳 Running with Docker

Run Docker Container

☁️ Deployment

AWS EC2

Azure Web App

🧾 Logging & Monitoring

🧰 Tech Stack

📈 Future Enhancements

👨‍💻 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages