This project focuses on building a machine learning model to predict the presence of heart disease using Logistic Regression. It uses a dataset containing various medical attributes of patients and aims to assist in early diagnosis based on clinical data.
- Predict the likelihood of heart disease in a patient
- Use logistic regression as a baseline classification model
- Analyze important health features contributing to the prediction
- The dataset typically includes:
- Age
- Sex
- Chest pain type
- Resting blood pressure
- Cholesterol
- Fasting blood sugar
- Resting ECG results
- Maximum heart rate achieved
- Exercise-induced angina
- ST depression induced by exercise
- Slope of peak exercise ST segment
- Number of major vessels colored by fluoroscopy
- Thalassemia
- Target (0 = no disease, 1 = disease)
- Python
- Pandas, NumPy – data handling
- Matplotlib, Seaborn – data visualization
- Scikit-learn – modeling and evaluation
- Data Loading & Exploration
- Check for missing values and understand feature distributions.
- Data Preprocessing
- Encode categorical variables.
- Feature scaling (if needed).
- Modeling
- Train a Logistic Regression model.
- Evaluate performance using accuracy, precision, recall, and F1-score.
- Visualization
- Plot confusion matrix, ROC curve, feature correlations.
-
Clone the repository:
git clone https://github.com/your-username/heart-disease-prediction.git cd heart-disease-prediction
-
Install dependencies:
pip install -r requirements.txt
-
Run the notebook:
jupyter notebook
-
Open
Heart Disease Prediction Using Logistic Regression.ipynb
and follow along.
- Logistic Regression achieved reasonable performance for baseline classification.
- Key indicators included chest pain type, cholesterol levels, and maximum heart rate.
- Try other models: Random Forest, XGBoost, or Neural Networks.
- Perform hyperparameter tuning with GridSearchCV.
- Deploy the model using Flask or Streamlit for interactive predictions.
This project is licensed under the MIT License.