Skip to content

chatterjeesaurabh/End-to-End-Machine-Learning-Pipeline-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End Machine Learning Prediction Pipeline

An end-to-end machine learning project on Diamond Price Prediction with MLflow, DVC, Airflow, Docker, Flask and Azure ACR.

Problem Statement

You are hired by Gem Stones Co. Ltd, a cubic zirconia manufacturer, to predict the prices of stones based on a dataset of 27,000 samples. The company aims to identify higher and lower profitable stones to optimize its profit share. Your task is to develop a model that predicts stone prices and determine the top 5 most important attributes for accurate predictions.

Objective

Build an end-to-end solution pipeline which can retrieve raw data, automatically apply data transformation and train machine learning models. Automated evaluation and deployment of the model and pipeline to cloud.

Dataset

The data has been is taken from Kaggle. Over 1,93,573 data points are present with following features: Carat, Cut, Color, Clarity, Depth, Table, x, y, z and Price.

Tools & Technologies

Architecture


Pipeline architecture

Exploratory Data Analysis

The exploratory data analysis and modeling is done in the experiments.ipynb notebook.

Experiment Tracking


Model experiment tracking using MLflow

Web App


Streamlit web app

Setup

Setup development environment

bash init_setup.sh

Acivate environment

source activate ./env

Install project as local package

python setup.py install

Run complete pipeline with DVC versioning

dvc repro

MLflow experiment tracking

mlflow ui

Pushing project docker image to Azure Container Registry

docker build -t <registry_name>.azurecr.io/<image_name>:latest
docker login <registry_name>.azurecr.io
username <user_name>
password <password>
docker push <registry_name>.azurecr.io/<image_name>:latest

Run Airflow

  • First create /airflow/dags/ folder and add your pipeline dag python file.
  • Set /airflow folder as default home for Airflow by adding this environment variable in .env file: AIRFLOW_HOME = /airflow.
airflow init db
airflow user create -e <email> -f <first_name> -l <last_name> -p <password> -u <username> -r admin
nohup airflow scheduler & airflow webserver

Run Flask and Streamlit app

  • Flask app
python app.py
  • Streamlit app
streamlit run streamlit_app.py

Contributions

Saurabh Chatterjee
MTech, Signal Processing and Machine Learning
Indian Institute of Technology (IIT) Kharagpur

About

An end-to-end machine learning project on Diamond Price Prediction with MLflow, DVC, Airflow, Docker, Flask and Azure ACR.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published