Diabetic Retinopathy (DR) is a leading cause of blindness. Early and accurate diagnosis is crucial for intervention.
This project presents five progressively advanced deep learning models for classifying DR severity from retinal fundus images:
- EfficientNet-based CNNs
- Vision Transformer (ViT) architectures
- Multi-task learning (classification + regression + ordinal)
- Two-stage cascaded classification pipelines
- Focal Loss to handle class imbalance
Run or explore the notebook:
Google Colab Notebook
- Source: Kaggle (Diabetic Retinopathy Classification #3)
- Training Images: 2,197
- Test Images: 1,465
- Classes:
0
: No DR1
: Mild DR2
: Moderate DR3
: Severe DR4
: Proliferative DR
- Distribution showed strong class imbalance, requiring special strategies like Focal Loss and two-stage pipelines.
- Visual inspections and class-wise EDA were performed to guide preprocessing choices.
Technique | Purpose |
---|---|
CLAHE | Enhance micro-lesions and vessels |
Green Channel Extraction | Highlight vascular contrast |
Gaussian Blur (Ben Graham) | Sharpen edge-level features |
Resizing & Normalization | Prepare inputs for model compatibility |
Data Augmentation | Flip, rotate, affine for generalization |
- Simple 5-class classifier
- Green channel preprocessing
- CrossEntropy loss
- Multi-task learning with classification, regression, and ordinal heads
- Combiner model for final prediction
- Weighted CrossEntropy, MSE, and BCEWithLogits
- Same architecture as Model 2 but focused only on classification
- Improved rare class prediction
- Stage 1: Binary ViT model (No DR vs DR)
- Stage 2: 4-class ViT model for DR severity
- Ben Graham method + CLAHE
- Final & best-performing model
- Multi-task heads + Focal Loss in Stage 2
- Trimmed mean for regression output smoothing
Metric | Description |
---|---|
Accuracy | Used only for Model 1 |
Cohen’s Kappa | Primary metric for Models 2–5 (handles imbalance) |
Soft-Voting | Ensemble from K-Fold predictions |
Trimmed Mean | For regression heads in Model 2 |
Model | Kaggle Accuracy | Best Validation Kappa |
---|---|---|
1 | 0.78020 | Accuracy ≈ 0.99 |
2 | 0.83412 | 0.8948 – 0.9268 |
3 | 0.84163 | 0.8882 – 0.9167 |
4 | 0.85870 | Stage 1: 0.9499 – 0.9773 |
5 | 0.86075 | Stage 1: 0.9590 – 0.9818, Stage 2: 0.5841 – 0.7438 |
- Focal Loss effectively addresses class imbalance
- ViT models outperform CNNs for global feature recognition
- Two-stage pipelines allow DR detection and grading to be decoupled
- Multi-task learning enriches feature representations