🧠 Pima Diabetes Prediction

This repository demonstrates an end-to-end machine learning pipeline for binary classification using the Pima Indians Diabetes dataset. It applies both a tuned neural network and XGBoost classifier, along with data preprocessing, exploratory data analysis, outlier handling, and visualization.

🔍 Problem Statement

Diabetes is a chronic condition with significant public health impact. The goal is to accurately classify whether a patient has diabetes based on medical measurements.

📊 Dataset

Source: UCI Machine Learning Repository
Observations: 768 patients
Features:
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age
- Outcome (target variable: 1 for diabetic, 0 for non-diabetic)

💡 Key Features of the Project

✔️ Exploratory data analysis with distribution plots and correlation matrix
✔️ Outlier detection using the IQR method
✔️ Missing/impossible values imputed with medians
✔️ Data standardization
✔️ Deep Learning (Neural Network with keras-tuner for optimization)
✔️ Tree-Based Model (XGBoost) for benchmarking
✔️ ROC/AUC comparisons and confusion matrix visualizations
✔️ Fully modular and reproducible codebase

🧪 Models Implemented

Model	Library	Notes
Neural Network	TensorFlow/Keras	Tuned using `keras-tuner` (RandomSearch)
XGBoost	XGBoost	Strong tree-based benchmark

🚀 How to Run

1. Clone the repository

git clone https://github.com/yourusername/pima-diabetes-prediction.git
cd pima-diabetes-prediction

Install dependencies

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Pima Diabetes Prediction

🔍 Problem Statement

📊 Dataset

💡 Key Features of the Project

🧪 Models Implemented

🚀 How to Run

1. Clone the repository

Install dependencies

📊 Model Evaluation Results

🔗 Correlation Matrix

📈 Feature Distributions

📉 Accuracy and Loss Over Epochs

📉 ROC Curve Comparison

About

Uh oh!

Releases

Packages

Languages

AlyssonAlvesPinto/pima-diabetes-prediction

Folders and files

Latest commit

History

Repository files navigation

🧠 Pima Diabetes Prediction

🔍 Problem Statement

📊 Dataset

💡 Key Features of the Project

🧪 Models Implemented

🚀 How to Run

1. Clone the repository

Install dependencies

📊 Model Evaluation Results

🔗 Correlation Matrix

📈 Feature Distributions

📉 Accuracy and Loss Over Epochs

📉 ROC Curve Comparison

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages