Skip to content

This project focuses on predicting the likelihood of heart disease using machine learning techniques. The dataset includes medical features like age, blood pressure, cholesterol, and heart rate. The model uses algorithms like CatBoost and Random Forest to predict the presence of heart disease, assisting early diagnosis.

Notifications You must be signed in to change notification settings

mehdighelich1379/Heart_Disease

Repository files navigation

🫀 AI Heart Disease Prediction

🔗 Live Demo

You can try the app live here:
🌐 Click to open Heart Disease App

📸 Demo

App Screenshot App Screenshot

📂 Project Structure

Heart_Disease/
│
├── images/                  # Project images for README & UI
│   ├── Demo1.png
│   ├── Demo2.png
│   └── heart_image.jpg
│
├── models/                  # Saved model files
│   └── catboost_model.txt
│
├── notebooks/               # Jupyter notebooks for experiments
│   ├── catboost_info/
│   ├── Final_Heart_Model.ipynb
│   ├── heart_pipeline_model.pkl
│   └── pipeline.ipynb
│
├── scale/
│   └── scaler.txt
│
├── src/                     # Source code
│   ├── data/
│   │   └── preprocess.py    # Preprocessing pipeline
│   ├── metrics/
│   │   └── evaluate.py      # Evaluation utilities
│   └── models/
│       └── train_model.py   # Model training script
│
├── app.py                   # Streamlit web application
├── main.py                  # Entry point for running training/evaluation
├── heart_pipeline_model.pkl # Saved trained pipeline
├── pipeline_heart.pkl       # Alternative saved pipeline
├── requirements.txt         # Python dependencies
└── README.md                # Project documentation


📊 Dataset Overview
We used the heart.csv dataset containing medical attributes to predict heart disease.

Features:

age – Age of the patient

sex – Gender (0=female, 1=male)

cp – Chest pain type (0–3)

trestbps – Resting blood pressure

chol – Serum cholesterol (mg/dl)

fbs – Fasting blood sugar > 120 mg/dl (0/1)

restecg – Resting ECG results (0–2)

thalach – Max heart rate achieved

exang – Exercise-induced angina (0/1)

oldpeak – ST depression induced by exercise

slope – Slope of the peak exercise ST segment

ca – Number of major vessels (0–4)

thal – Thalassemia type (0–3)

target – Presence of heart disease (1=yes, 0=no)

🚀 Project Workflow
Exploratory Data Analysis (EDA)

Analyzed dataset distributions and feature correlations.

No missing values detected.

Data Visualization

Plotted histograms and density plots for key medical indicators.

Data Preprocessing

Applied StandardScaler for scaling numerical features.

Removed noise using Local Outlier Factor (LOF).

Balanced classes using SMOTETomek.

Model Training

CatBoost Classifier as the final production model.

Evaluation Metrics

Accuracy, Recall, F1-Score, Confusion Matrix, ROC Curve.

Feature Importance

Most influential: age, thalach, chol.

Deployment

Built a Streamlit app for user-friendly predictions.

Input form for patient details.

Multi-language (Persian/English) support.

Detailed "Doctor AI" analysis based on medical thresholds.

📈 Model Performance (Final CatBoost Model)
Classification Report:


              precision    recall  f1-score   support
0             0.89        0.86      0.88       29
1             0.88        0.91      0.89       32

accuracy                          0.89       61
macro avg     0.89        0.88      0.89       61
weighted avg  0.89        0.89      0.89       61
Confusion Matrix:


TN = 25   FP = 4
FN = 3    TP = 29


💡 How to Run Locally

# Clone repository
git clone https://github.com/yourusername/Heart_Disease.git
cd Heart_Disease

# Install dependencies
pip install -r requirements.txt

# Run the Streamlit app
streamlit run app.py
🧠 Model Used
CatBoost Classifier (Final model in production pipeline)

✨ Features of the Web App
🌐 Persian/English language switch

🖼 Medical image-based UI

📊 Doctor AI detailed analysis

🎲 Random patient data generator

📉 Probability-based risk scoring

About

This project focuses on predicting the likelihood of heart disease using machine learning techniques. The dataset includes medical features like age, blood pressure, cholesterol, and heart rate. The model uses algorithms like CatBoost and Random Forest to predict the presence of heart disease, assisting early diagnosis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages