Skip to content

Added thyroid prediction ML code #1211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions thyroid prediction/.idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions thyroid prediction/.idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions thyroid prediction/.idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions thyroid prediction/.idea/thyroid prediction.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2,081 changes: 2,081 additions & 0 deletions thyroid prediction/.ipynb_checkpoints/Thyroid-checkpoint.ipynb

Large diffs are not rendered by default.

70 changes: 70 additions & 0 deletions thyroid prediction/.ipynb_checkpoints/readme-checkpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# **Thyroid Disease Detection using Machine Learning**

## **GOAL**
The goal of this project is to develop a machine learning model that can accurately predict recurrence of well differentiated thyroid cancer. The project involves **data preprocessing, feature selection, and classification models** to identify the best approach for thyroid disease detection.

Dataset can be downloaded from [here](https://www.kaggle.com/sudhanshu8897/thyroid-disease-dataset).

---

## **MODELS USED**
- Logistic Regression
- Support Vector Machine (SVM)
- Decision Trees
- Random Forest

---

## **LIBRARIES NEEDED**
- numpy
- pandas
- seaborn
- matplotlib
- scikit-learn

---

## **STEPS BEING FOLLOWED**

1. Load the dataset
2. Import libraries
3. Data Visualization
4. Data Preprocessing (handling missing values, outliers, and scaling)
5. Splitting data into training and testing sets
6. Feature Selection Techniques:
- Principal Component Analysis (PCA)
- Mutual Information
- Recursive Feature Elimination with Cross-Validation (RFECV)
7. Model training using different classification algorithms
8. Model evaluation using accuracy, precision, recall, F1-score, and AUC-ROC
9. Comparison of models based on performance and training time

---

## **CONCLUSION : Best Accuracy of Each Model**

By using Logistic Regression:
```
Accuracy: 95.41%
Feature Selection Technique:PCA
```

By using Random Forest:
```
Accuracy: 98.26%
Feature Selection Technique:MutualInfo
```

By using Support Vector Machine (SVM):
```
Accuracy: 95.11%
Feature Selection Technique:PCA
```

By using Decision Trees:
```
Accuracy: 93.91%
Feature Selection Technique:MutualInfo
```

---
Loading