A machine learning project developed to predict the recurrence of well-differentiated thyroid cancer using patient clinical and pathological data. This project aims to assist healthcare professionals in making data-driven decisions through predictive analytics.
This healthcare analytics project leverages classification models to identify high-risk patients using 15 years of retrospective data. It includes:
- End-to-end pipeline from EDA, preprocessing, modeling, to deployment
- Interactive Power BI dashboard
- Ready-to-use model prediction interface
-
Exploratory Data Analysis (EDA) on demographics, tumor characteristics, and risk factors
-
Data preprocessing: missing value handling, encoding categorical features, and scaling numerical features
-
Model building using:
- Logistic Regression
- Support Vector Classifier
- Random Forest (Best Model: 98.7% Accuracy)
- Gradient Boosting, Bagging, and SGD Classifier
-
Hyperparameter tuning for Random Forest
-
Model deployment: save/load with
joblib
, single prediction interface -
Power BI Dashboard
for visual analytics
Model | Accuracy |
---|---|
Logistic Regression | 97.4% |
SGD Classifier | 94.8% |
Support Vector Classifier (SVC) | 96.1% |
Random Forest (Best) | 98.7% |
Gradient Boosting | 97.4% |
Bagging Classifier | 97.4% |
- Best Model: Random Forest
- Accuracy: 98.7%
- Precision/Recall/F1: High across all metrics
- Top Predictive Features: Stage, Pathology Type, Age, Focality, Radiotherapy History
This project includes a Power BI dashboard that provides interactive visualizations of the dataset. It helps users easily explore patterns related to thyroid cancer recurrence.
📌 Interactive dashboard includes:
- Demographics & Patient History
- Clinical Examination & Diagnosis
- Cancer Staging & Treatment Response
- Recurrence heatmaps and slicers by age, gender, pathology, etc.
This predictive model can be effectively integrated into Clinical Decision Support Systems (CDSS) to enhance healthcare delivery in the following ways:
- 🎯 Identify high-risk patients for follow-up
- 🧩 Personalized treatment strategies
- 🏥 Hospital resource optimization
This module allows clinicians or users to input a single patient's clinical profile and receive a prediction on whether thyroid cancer is likely to recur.
📥 Example Input Format:
⚙️ How It Works
-
Input is received in dictionary format.
-
Categorical values are encoded using
OneHotEncoder
. -
Numerical values (Age) are scaled using
MinMaxScaler
. -
Features are combined into a prediction-ready format.
-
Random Forest model predicts:
-
✅ Prediction (
Yes
/No
for recurrence) -
📊 Probability score (confidence level)
-
🧠 Prediction Function
📌 Output
This means the patient is unlikely to experience recurrence, with a 97.9% confidence level.
- Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Plotly)
- Power BI for visualization
- Jupyter Notebook
joblib
for model serialization
A detailed project report is included in Thyroid_Disease_Prediction_Report.pdf
, covering methodology, model performance, results, and recommendations.