This project leverages Streamlit to create a web-based application that predicts the likelihood of Diabetes, Heart Disease, and Parkinson's Disease based on user-provided health data.
π’ Deployed at: [https://hiremeplsthx.streamlit.app/]
β
Predicts multiple diseases using trained ML models
β
Displays model performance metrics (Accuracy, Precision, Recall, F1 Score)
β
Includes sample data for quick testing
β
Unified prediction interface with all three diseases accessible via the left sidebar
β
User-friendly interface powered by Streamlit
- Clone this repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name
conda create -n disease_prediction_env python=3.10
conda activate disease_prediction_env
- Install the required dependencies:
pip install -r requirements.txt
- Run the application:
streamlit run app.py
- Select the desired disease prediction option from the sidebar.
- Enter your health details in the provided input fields.
- Click the Predict button to view the prediction result and model metrics.
- β Added "Healthy" and "Non-Healthy" buttons to simplify testing with pre-existing values.
- π½ Improved usability by integrating multiple diseases under a unified sidebar menu.
- π Stored model performance metrics (accuracy, precision, etc.) in JSON format for easy data handling and streamlined updates.
- π Utilized Markdown in Streamlit for a clearer and more informative presentation of results.
- π§Ή Enhanced code readability with well-structured comments and Markdown descriptions in Jupyter Notebook.
- π§ Perform a comprehensive code review to improve stability and performance.
- π Experiment with models such as Random Forest, SVM, and XGBoost for enhanced prediction accuracy.
- π¨ Improve the UI design to provide a better user experience.
The project follows a structured development pipeline:
- Environment Setup βοΈ
- Dataset Acquisition π
- Data Preprocessing π§ͺ
- Model Training and Saving π§
- Streamlit Deployment π
- Enhancements and Testing β
- Pima Indians Diabetes Database (Kaggle)
- Indian Parkinson's Patient Records (Kaggle)
- Parkinson's Disease Dataset (Kaggle)
# Load the datasets
diabetes = pd.read_csv("data/diabetes.csv")
heart = pd.read_csv("data/heart.csv")
parkinsons = pd.read_csv("data/parkinsons.csv")
# Save processed data
diabetes.to_csv("data/diabetes_cleaned.csv", index=False)
heart.to_csv("data/heart_cleaned.csv", index=False)
parkinsons.to_csv("data/parkinsons_cleaned.csv", index=False)
- π©Ί Addressed missing values, outliers, and scaling issues for improved model performance.
- π Utilized
StandardScaler
for consistent scaling to prevent skewed predictions.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
- Implemented Logistic Regression for its simplicity and effectiveness.
- π Future plans include exploring Random Forest, SVM, and XGBoost.
from sklearn.linear_model import LogisticRegression
import joblib
import json
# Train and save the model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)
# Save model and scaler
joblib.dump({'model': model, 'scaler': scaler}, 'models/diabetes_model.pkl')
# Save performance metrics
metrics = {
'accuracy': 0.89,
'f1_score': 0.87,
'recall': 0.85,
'precision': 0.88
}
with open("metrics.json", "w") as f:
json.dump(metrics, f)
- π’ "Fill Healthy Values" and π΄ "Fill Diabetic Values" buttons simplify testing.
- π½ Implemented dropdown menus for binary options for improved user experience.
- π Utilized session states to manage dynamic inputs efficiently.
- β Models are integrated with scalers to ensure accurate predictions in Streamlit.
streamlit run app.py
Visit the live app here: [https://hiremeplsthx.streamlit.app/]
If you find this project helpful, consider giving it a β and sharing your thoughts! Suggestions and improvements are welcome. π
This project is licensed under the MIT License.
Mirang Bhandari