Welcome to the Diabetes Prediction Project, a machine learning and deep learning-based system designed to predict the likelihood of diabetes occurrence. This repository contains the code, data, and documentation for the project, conducted as part of the IUST University (4031 semester) coursework.
The goal of this project is to leverage machine learning and deep learning models to predict diabetes outcomes using real-world clinical data. Various models are trained, evaluated, and optimized to identify the most accurate and efficient predictor.
- Data cleaning and preprocessing
- Training multiple machine learning models
- Hyperparameter tuning for performance optimization
- Comparison of classical models with deep learning approaches
- Ensemble techniques for improved accuracy
- Visualizations and detailed analysis of model performance
- Data Preprocessing: Cleaned, scaled, and encoded data for efficient training.
- Model Comparison: Logistic Regression, Random Forest, SVM, KNN, and Neural Networks.
- Optimization: Hyperparameter tuning using GridSearchCV.
- Ensemble Techniques: Gradient Boosting, AdaBoost, and Random Forest Ensembles.
- Evaluation Metrics: Accuracy, AUC-ROC, Precision, Recall, F1-score, Confusion Matrix.
- Visualizations: ROC curves, heatmaps, and decision boundaries.
The Gradient Boosting model achieved the best performance:
- Accuracy: 84%
- AUC-ROC: 89%
Deep Learning models also performed well:
- Accuracy: 82%
- AUC-ROC: 87%
pip install -r requirements.txt
python main.py
- Use the notebooks for detailed exploratory analysis and model training.
- The
main.py
script provides an end-to-end pipeline for training and evaluation. - Access visual outputs in the
outputs
directory.
- Programming Language: Python
- Libraries: Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn, TensorFlow/Keras
- Development Environment: Jupyter Notebooks, VS Code
This project was developed as part of the Machine Learning Course at IUST University (4031 semester). The project aimed to provide practical exposure to predictive modeling, machine learning algorithms, and optimization techniques.
Contributions, issues, and feature requests are welcome! Feel free to fork this repository and submit pull requests.
This project is licensed under the MIT License. See the LICENSE
file for details.