Predicting-Diabetes-Model

Diabetes is a prevalent health concern affecting individuals worldwide. Early identification of individuals at risk is crucial for implementing preventive measures and promoting better health outcomes. This project aims to develop a predictive model for diabetes risk assessment using a dataset containing relevant health parameters.

Problem Statement

Many people around the world have diabetes, and we need a way to find out who might get it early on. Right now, there aren't easy tools for regular people or doctors to check someone's risk of getting diabetes using important health information. So, we want to make a simple and reliable computer program to do just that – help people and doctors know the risk of diabetes early, so they can take action and plan personalized ways to stay healthy.

Dataset

gender: Gender of the individual (categorical: Female/Male).
age: Age of the individual.
hypertension: Whether the individual has hypertension (binary: 0 for No, 1 for Yes).
heart_disease: Whether the individual has heart disease (binary: 0 for No, 1 for Yes).
smoking_history: Smoking history of the individual (categorical: never/current/No Info).
bmi: Body Mass Index (numeric).
hba1c_level: HbA1c level, a measure of average blood glucose over a few months (numeric).
blood_glucose_level: Fasting blood glucose level (numeric).
diabetes: Target variable indicating the presence of diabetes (binary: 0 for No, 1 for Yes).

EDA

Explored dataset structure and types, summarized numerical and categorical features, assessed missing data, analyzed relationships between features, and examined class distribution for insights into potential imbalances.

Model Training

Model Training with Decision Tree:

A Decision Tree model was trained to predict diabetes based on features like age, BMI, and medical history. The scikit-learn library facilitated both model training and hyperparameter tuning through a grid search. Hyperparameters, including tree depth and minimum samples per leaf, were optimized to enhance the model's predictive performance.

Model Training with Logistic Regression:

Employing the scikit-learn toolkit, we trained a Logistic Regression model on a dataset comprising features like age, BMI, and medical history for diabetes prediction. Parameter tuning involved exploring values of C (inverse of regularization strength) including 1, 0.1, 0.01, and 10. The optimal parameter was identified as C = 10, and this tuned model, showcasing superior performance, was selected for deployment. AUC-ROC curve analysis was instrumental in making this determination.

Environment Management

I used pipenv for the virtual environment.

pip install pipenv

To replicate the environment, on your command line, use

pipenv install numpy scikit-learn==0.24.2 flask waitress

For the required versions for libraries, use requirements.txt

$pip install -r requirements.txt

Containerization

Use this command for container building.

$Docker build -t 'container_name'

Use this container running

$Docker run -it --rm -p 9696:9696 'container_name'

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Dockerfile.txt		Dockerfile.txt
Pipfile.lock		Pipfile.lock
Pipfile.txt		Pipfile.txt
Predicting_Diabetes_.ipynb		Predicting_Diabetes_.ipynb
README.md		README.md
diabetes_prediction_dataset.csv		diabetes_prediction_dataset.csv
model_C=10.bin		model_C=10.bin
ping.py		ping.py
predict.py		predict.py
predict_test.py		predict_test.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting-Diabetes-Model

Problem Statement

Dataset

EDA

Model Training

Model Training with Decision Tree:

Model Training with Logistic Regression:

Environment Management

Containerization

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DigitalSuYee/Predicting-Diabetes-Model

Folders and files

Latest commit

History

Repository files navigation

Predicting-Diabetes-Model

Problem Statement

Dataset

EDA

Model Training

Model Training with Decision Tree:

Model Training with Logistic Regression:

Environment Management

Containerization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages