A deep learning-based project designed to classify diseases based on symptoms. This repository includes scripts for preprocessing datasets, creating optimal data representations, and training a deep neural network model to predict diseases.
- Overview
- Dataset
- Features
- Requirements
- Installation
- Usage
- Files and Scripts
- Model
- Future Work
- Contributing
- License
Disease Classifier is a tool designed to assist in identifying potential diseases based on a user’s symptoms. The system leverages a deep neural network trained on preprocessed symptom-disease data to provide predictions. This project is intended for educational and exploratory purposes.
The project includes the following datasets:
new_df.csv
: A refined dataset prepared for machine learning.dataset.csv
: The original dataset containing disease-symptom relationships.final_df.csv
: The processed dataset used for training the classifier.Symptom-severity.csv
: Details the severity of each symptom.symptom_Description.csv
: Contains descriptions of symptoms.symptom_precaution.csv
: Lists precautions to take for each disease.
The creating an optimal dataset.ipynb
script processes these datasets to produce final_df.csv
, an optimized representation of the data used for training the classifier.
- Preprocesses and cleans raw datasets by standardizing symptoms and removing inconsistencies.
- Construct a deep learning model using TensorFlow and Keras to classify diseases.
- Output predictions based on symptom input, with the top prediction highlighted.
The project requires the following Python libraries:
pandas
numpy
scikit-learn
tensorflow
keras
joblib
- Clone this repository:
git clone https://github.com/your-username/disease-classifier.git
- Navigate to the project directory:
cd disease-classifier
- Install dependencies manually using pip:
pip install pandas numpy scikit-learn tensorflow keras joblib
Run the creating an optimal dataset.ipynb
script to preprocess and generate the final_df.csv
dataset:
- Cleans the original dataset by standardizing and encoding symptom names.
- Saves the processed dataset for training the model.
Execute the disease_clf_deepLearning.ipynb
notebook to:
- Load the preprocessed dataset.
- Train a deep learning model on symptom-disease data.
- Save the trained model for later use (optional).
Modify the prediction block in disease_clf_deepLearning.ipynb
to test new symptom inputs. The model outputs the most probable disease based on the given symptoms.
creating an optimal dataset.ipynb
: Preprocesses datasets to createfinal_df.csv
.disease_clf_deepLearning.ipynb
: Builds and trains a deep learning model, and makes predictions.Symptom-severity.csv
: Provides information on the severity of each symptom.dataset.csv
: The original dataset containing disease-symptom relationships.final_df.csv
: The processed dataset used for training the classifier.symptom_Description.csv
: Contains descriptions of symptoms.symptom_precaution.csv
: Lists precautions for each disease.
The neural network model consists of:
- Input Layer: Accepts 131 symptom features.
- Hidden Layers:
- Dense layer with 132 nodes and sigmoid activation.
- Dense layer with 50 nodes and ReLU activation.
- Dense layer with 17 nodes and ReLU activation.
- Output Layer: Dense layer with 41 nodes and sigmoid activation, representing the number of diseases.
The model is compiled with:
- Optimizer: Adam
- Loss Function: Binary cross-entropy
- Evaluation Metric: Accuracy
- Improve model accuracy by experimenting with hyperparameters and architectures.
- Add support for multi-class disease predictions.
- Develop a user-friendly interface (e.g., web or mobile app) for interacting with the classifier.
Contributions are welcome! Please open an issue or submit a pull request for any bugs, improvements, or feature requests.