This project focuses on predicting the histologic stage of liver cirrhosis based on various medical and clinical indicators of a patient. Using a dataset from a Mayo Clinic study on primary biliary cirrhosis (PBC) conducted between 1974 and 1984, machine learning models were trained to classify patients into one of three stages of liver damage:
- Stage 1 → Mild
- Stage 2 → Moderate
- Stage 3 → Severe
-
Dataset Source:
- Mayo Clinic study on Primary Biliary Cirrhosis (PBC).
- Period: 1974–1984.
-
Target Variable:
- Stage of cirrhosis (1, 2, or 3).
-
Features Used:
The dataset includes medical and biochemical attributes such as:N_Days
: Duration from registration to death/transplant/study endStatus
: Patient status – C (Censored), CL (Censored due to liver transplant), D (Death)Drug
: Drug administered – D-penicillamine or placeboAge
: Age in daysSex
: M or FAscites
,Hepatomegaly
,Spiders
: Presence of medical symptoms (Y/N)Edema
: Level of edema severity (N/S/Y)- Biochemical values:
Bilirubin
(mg/dl),Cholesterol
(mg/dl),Albumin
(gm/dl),Copper
(ug/day)Alk_Phos
(U/l),SGOT
(U/ml),Triglycerides
(mg/dl)Platelets
(per 1000 ml),Prothrombin
time (sec)
- Converted categorical features (e.g.,
Sex
,Drug
,Edema
, etc.) to numerical format. - Handled missing data using appropriate imputation strategies.
- Normalized numerical features for improved model performance.
A range of classification models were evaluated using StackingClassifier with different meta (final) estimators:
-
Models Tested:
Logistic Regression
→ 59.64% accuracySGD Classifier
→ 58.38% accuracyLDA
→ 59.40% accuracyRandom Forest
→ 94.76% accuracyXGBoost
→ 95.54% accuracySVM
→ 84.16% accuracyKNN
→ 88.42% accuracyGaussian
→ 50.54% accuracy
-
Base Learners:
RandomForestClassifier
XGBClassifier
-
Final Estimators Tested:
SVC
→ 95.90% accuracyLogisticRegression
→ 95.74% accuracySGDClassifier
→ 95.64% accuracyLinearDiscriminantAnalysis
→ 95.80% accuracyRandomForestClassifier
→ 95.86% accuracyXGBClassifier
→ 96.14% accuracy
-
Final Estimator:
XGBClassifier
was chosen as the final meta-model due to its superior performance.
-
Test 1:
- Input: Synthetic random patient data
- Output:
- Predicted Cirrhosis Stage:
1
→ Stage 1 (Mild)
- Predicted Cirrhosis Stage:
- ✔️ Shows model is capable of detecting low-risk cases.
-
Test 2:
- Input: Real sample from original dataset
- Output:
- Predicted Cirrhosis Stage:
3
→ Stage 3 (Severe)
- Predicted Cirrhosis Stage:
- ✔️ Confirms model can detect advanced liver damage from clinical data.
- Input patient features (age, lab results, symptoms, etc.) in the required format.
- The model will output the predicted stage of cirrhosis with a corresponding severity label.
- Can assist medical professionals in early detection and prioritization of treatment.
- Python 3.12.5