Skip to content

🩺 Multiple Disease Prediction App β€” A Streamlit-powered tool for predicting diabetes, heart disease, and Parkinson's conditions using machine learning models, created from scratch. Features include pre-filled sample data, performance metrics display, and easy-to-use interface

License

Notifications You must be signed in to change notification settings

Bloodwingv2/Multiple_Disease_Prediction

Repository files navigation

Multiple Disease Prediction Project βš•οΈπŸ’‰

This project leverages Streamlit to create a web-based application that predicts the likelihood of Diabetes, Heart Disease, and Parkinson's Disease based on user-provided health data.

🟒 Deployed at: [https://hiremeplsthx.streamlit.app/]

Features

βœ… Predicts multiple diseases using trained ML models
βœ… Displays model performance metrics (Accuracy, Precision, Recall, F1 Score)
βœ… Includes sample data for quick testing
βœ… Unified prediction interface with all three diseases accessible via the left sidebar
βœ… User-friendly interface powered by Streamlit

Installation

  1. Clone this repository:
    git clone https://github.com/your-username/your-repo-name.git
    cd your-repo-name

βš™οΈ Create and Activate a Conda Environment

⚠️ Recommended: Creating a separate Conda environment helps isolate dependencies. However, you can skip this step if you're comfortable using global packages.

conda create -n disease_prediction_env python=3.10
conda activate disease_prediction_env
  1. Install the required dependencies:
    pip install -r requirements.txt
  2. Run the application:
    streamlit run app.py

Usage

  1. Select the desired disease prediction option from the sidebar.
  2. Enter your health details in the provided input fields.
  3. Click the Predict button to view the prediction result and model metrics.

✨ Key Enhancements

  • βž• Added "Healthy" and "Non-Healthy" buttons to simplify testing with pre-existing values.
  • πŸ”½ Improved usability by integrating multiple diseases under a unified sidebar menu.
  • πŸ“Š Stored model performance metrics (accuracy, precision, etc.) in JSON format for easy data handling and streamlined updates.
  • πŸ“ Utilized Markdown in Streamlit for a clearer and more informative presentation of results.
  • 🧹 Enhanced code readability with well-structured comments and Markdown descriptions in Jupyter Notebook.

πŸ” Future Enhancements

  • πŸ”§ Perform a comprehensive code review to improve stability and performance.
  • πŸ“ˆ Experiment with models such as Random Forest, SVM, and XGBoost for enhanced prediction accuracy.
  • 🎨 Improve the UI design to provide a better user experience.

πŸ—‚οΈ Project Structure (Refer Jupyter Notebook and run each cell)

The project follows a structured development pipeline:

  1. Environment Setup βš™οΈ
  2. Dataset Acquisition πŸ“„
  3. Data Preprocessing πŸ§ͺ
  4. Model Training and Saving 🧠
  5. Streamlit Deployment 🌐
  6. Enhancements and Testing βœ…

πŸ“„ Dataset Acquisition

πŸ“š Datasets Used

  • Pima Indians Diabetes Database (Kaggle)
  • Indian Parkinson's Patient Records (Kaggle)
  • Parkinson's Disease Dataset (Kaggle)

πŸ“₯ Loading the Datasets

# Load the datasets
diabetes = pd.read_csv("data/diabetes.csv")
heart = pd.read_csv("data/heart.csv")
parkinsons = pd.read_csv("data/parkinsons.csv")

πŸ’Ύ Saving Processed Data for Future Use

# Save processed data
diabetes.to_csv("data/diabetes_cleaned.csv", index=False)
heart.to_csv("data/heart_cleaned.csv", index=False)
parkinsons.to_csv("data/parkinsons_cleaned.csv", index=False)

πŸ§ͺ Data Preprocessing

  • 🩺 Addressed missing values, outliers, and scaling issues for improved model performance.
  • πŸ”„ Utilized StandardScaler for consistent scaling to prevent skewed predictions.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

🧠 Train & Save the Models

πŸ€– Model Selection

  • Implemented Logistic Regression for its simplicity and effectiveness.
  • πŸ” Future plans include exploring Random Forest, SVM, and XGBoost.

πŸ’Ύ Training & Saving Models with Scalers

from sklearn.linear_model import LogisticRegression
import joblib
import json

# Train and save the model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

# Save model and scaler
joblib.dump({'model': model, 'scaler': scaler}, 'models/diabetes_model.pkl')

# Save performance metrics
metrics = {
    'accuracy': 0.89,
    'f1_score': 0.87,
    'recall': 0.85,
    'precision': 0.88
}

with open("metrics.json", "w") as f:
    json.dump(metrics, f)

🌐 Streamlit Deployment

🧩 Key Features

  • 🟒 "Fill Healthy Values" and πŸ”΄ "Fill Diabetic Values" buttons simplify testing.
  • πŸ”½ Implemented dropdown menus for binary options for improved user experience.
  • πŸ”„ Utilized session states to manage dynamic inputs efficiently.
  • βœ… Models are integrated with scalers to ensure accurate predictions in Streamlit.

▢️ Running the Application

streamlit run app.py

Visit the live app here: [https://hiremeplsthx.streamlit.app/]


⭐ Contributing

If you find this project helpful, consider giving it a ⭐ and sharing your thoughts! Suggestions and improvements are welcome. 😊


πŸ“œ License

This project is licensed under the MIT License.

πŸ‘¨β€πŸ’» Developed By

Mirang Bhandari

About

🩺 Multiple Disease Prediction App β€” A Streamlit-powered tool for predicting diabetes, heart disease, and Parkinson's conditions using machine learning models, created from scratch. Features include pre-filled sample data, performance metrics display, and easy-to-use interface

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published