Skip to content

This repository provides a complete pipeline for non-invasive blood glucose estimation using Photoplethysmography (PPG) signals. It includes data preprocessing, feature extraction, machine learning model training, and result visualization to support research and development in biomedical signal analysis and diabetes screening.

License

Notifications You must be signed in to change notification settings

Spidy104/PPG_DIABETES_CLASSIFICATION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒŸ PPG Blood Glucose Diabetes Classification

Welcome to the PPG Blood Glucose Diabetes Classification project! ๐ŸŽ‰
This repository provides a pipeline for estimating blood glucose levels using Photoplethysmography (PPG) signals.
By combining advanced signal processing, machine learning, and visualizations, we enable non-invasive diabetes screening for modern preventive healthcare. ๐Ÿฉบ๐Ÿ’ก


๐Ÿ“‹ Project Overview

This project uses PPGโ€”an optical technique for capturing blood volume changesโ€”to predict blood glucose levels non-invasively.

The modular framework integrates signal engineering and machine learning, making it suitable for experimentation and real-world use.

โœจ Key Features

  • ๐Ÿ“Š Data Processing: Preprocess raw PPG segments and extract physiological features
  • ๐Ÿค– Machine Learning Models: Random Forest, Gradient Boosting, SVM, LightGBM, Logistic Regression, and Ensemble Methods, Stacking and Voting Classifiers
  • ๐Ÿ“ˆ Visualizations: ROC curves, confusion matrices, and feature importance plots
  • ๐Ÿงช Evaluation: Subject-wise StratifiedGroupKFold cross-validation to prevent data leakage and ensure real-world applicability

๐Ÿ“‚ Project Structure

PPG_Blood_Glucose_JB_Implementation/
โ”œโ”€โ”€ datasets/
โ”‚   โ”œโ”€โ”€ ppg_bagging_tree_features.csv
โ”‚   โ”œโ”€โ”€ ppg_specific_features.csv
โ”‚   โ”œโ”€โ”€ processed_metadata.csv
โ”‚   โ”œโ”€โ”€ PPG-BP.xlsx
โ”‚   โ””โ”€โ”€ 0_subject/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ random_forest.pkl
โ”‚   โ”œโ”€โ”€ svm.pkl
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ outputs/
โ”‚   โ”œโ”€โ”€ confusion_matrix_randomforest.png
โ”‚   โ”œโ”€โ”€ roc_randomforest.png
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ excel_handling.py
โ”‚   โ”œโ”€โ”€ data_preprocessing.py
โ”‚   โ”œโ”€โ”€ train_models.py
โ”‚   โ””โ”€โ”€ evaluate.py
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿš€ Getting Started

Letโ€™s get you up and running in no time!

โœ… Prerequisites

  • Python 3.8+ ๐Ÿ
  • pip (Python package manager)
  • Git ๐Ÿ“ฆ

๐Ÿ›  Setup Instructions

1. Clone the Repository

git clone https://github.com/Spidy104/PPG_DIABETES_CLASSIFICATION
cd PPG_DIABETES_CLASSIFICATION

2. Set Up a Virtual Environment (Recommended)

python -m venv venv

3. Activate the Virtual Environment

  • On macOS/Linux:
    source venv/bin/activate
  • On Windows:
    venv\Scripts\activate

4. Install Dependencies

pip install -r requirements.txt

๐Ÿง  Usage Guide

๐Ÿ“ Step 1: Prepare the Data

Ensure the datasets/ folder contains:

  • ppg_bagging_tree_features.csv โ€” Contains extracted features from PPG signals using bagging tree methods for model training.
  • ppg_specific_features.csv โ€” Includes domain-specific physiological features derived from PPG signals.
  • processed_metadata.csv โ€” Metadata for each sample, such as subject IDs, timestamps, and labels (e.g., glucose levels).
  • PPG-BP.xlsx โ€” Raw and reference data, including PPG signals and corresponding blood pressure/glucose measurements.
  • 0_subject/ (Raw PPG signals by subject) โ€” Directory with raw PPG signal files, organized per subject for preprocessing.

โš™๏ธ Step 2: Process and Preprocess Data

2.1 Process Excel Metadata

python src/excel_handling.py

2.2 Preprocess Raw Data

python src/data_preprocessing.py

2.3 Feature Extraction

python src/feature_extraction.py

๐Ÿ‹๏ธ Step 3: Train the Models

Run the following command to train all machine learning models:

python src/train_models.py

This will generate model files in the models/ directory:

models/
โ”œโ”€โ”€ random_forest.pkl
โ”œโ”€โ”€ svm.pkl
โ”œโ”€โ”€ gradient_boosting.pkl
โ”œโ”€โ”€ lightgbm.pkl
โ”œโ”€โ”€ logistic_regression.pkl
โ”œโ”€โ”€ stacking_classifier.pkl
โ””โ”€โ”€ voting_classifier.pkl

๐Ÿ“Š Step 4: Evaluate and Visualize Results

Evaluate the trained models and generate visualizations:

python src/second_model.ipynb

Results and plots will be saved in the outputs/ directory:

outputs/
โ”œโ”€โ”€ gradient_boosting.jpg
โ”œโ”€โ”€ LightGBM.jpg
โ”œโ”€โ”€ Stacking_Classifier.jpg
โ”œโ”€โ”€ Voting_Classifier.jpg
โ”œโ”€โ”€ Model_performance.jpg
โ”œโ”€โ”€ ROC_curves.jpg

โœ… Example Outputs

Hereโ€™s a sneak peek at the insights you'll get:

๐Ÿงพ Classification Report

Below are the classification reports for each model (Model 2):

Random Forest

Class Precision Recall F1-Score Support
0 0.830 0.990 0.900 181
1 0.000 0.000 0.000 38
Accuracy 0.820 219
Macro Avg 0.410 0.500 0.450 219
Weighted Avg 0.680 0.820 0.750 219

SVM

Class Precision Recall F1-Score Support
0 0.830 1.000 0.910 181
1 0.000 0.000 0.000 38
Accuracy 0.830 219
Macro Avg 0.410 0.500 0.450 219
Weighted Avg 0.680 0.830 0.750 219

Gradient Boosting

Class Precision Recall F1-Score Support
0 0.830 0.980 0.900 181
1 0.330 0.050 0.090 38
Accuracy 0.820 219
Macro Avg 0.580 0.520 0.490 219
Weighted Avg 0.740 0.820 0.760 219

LightGBM

Class Precision Recall F1-Score Support
0 0.830 1.000 0.910 181
1 0.000 0.000 0.000 38
Accuracy 0.830 219
Macro Avg 0.410 0.500 0.450 219
Weighted Avg 0.680 0.830 0.750 219

Stacking Classifier

Class Precision Recall F1-Score Support
0 0.840 0.970 0.900 181
1 0.440 0.110 0.170 38
Accuracy 0.820 219
Macro Avg 0.640 0.540 0.540 219
Weighted Avg 0.770 0.820 0.770 219

Voting Classifier

Class Precision Recall F1-Score Support
0 0.830 0.990 0.900 181
1 0.000 0.000 0.000 38
Accuracy 0.820 219
Macro Avg 0.410 0.500 0.450 219
Weighted Avg 0.680 0.820 0.750 219

๐Ÿ“Š Gradient Boosting Performance

Gradient Boosting Performance Placeholder

๐Ÿงฎ LightGBM Confusion Matrix

LightGBM Confusion Matrix Placeholder

๐Ÿค– Stacking Classifier Results

Stacking Classifier Results Placeholder

๐Ÿ—ณ๏ธ Voting Classifier Results

Voting Classifier Results Placeholder

๐Ÿ“ˆ Model Performance Comparison

Model Performance Comparison Placeholder

๐Ÿ… ROC Curves for All Models

ROC Curves Placeholder


About

This repository provides a complete pipeline for non-invasive blood glucose estimation using Photoplethysmography (PPG) signals. It includes data preprocessing, feature extraction, machine learning model training, and result visualization to support research and development in biomedical signal analysis and diabetes screening.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published