This repository contains the complete pipeline for generating, training, and evaluating synthetic dermatology images using deep learning and statistical methods. Our project, DermSynth3D, addresses the challenges of limited dermatology datasets by focusing on class balancing through synthetic image generation.
Dermatology Synthetic 3D β A pipeline that synthesizes realistic dermatology images for weak classes and evaluates their utility through classification and quality metrics. This project combines synthetic image generation, classical ML training, PCA visualization, and FID-based evaluation for a complete pipeline in dermatological AI.
- π‘ Motivation
- π Important Links
- π½οΈ Demo Video
- π Directory Structure
- βοΈ Installation
- π Reproducing the Project
- π Outputs Brief Summary
- π Future Enhancements
- β Status: Complete
- π Cite
- π Important References
Medical image datasets often suffer from:
- Limited data for rare skin conditions
- Severe class imbalance in disease types
- Privacy constraints making real data hard to collect
This project uses synthetic generation to:
- Expand weak class samples
- Maintain realistic features
- Improve classifier generalization
- Enable experimentation without data privacy concerns
Resource | Link |
---|---|
Timesheet | Timesheet |
Slack Channel | Slack |
Project Report | Overleaf Report |
A walkthrough demo video of the full DermSynth3D pipeline including synthetic generation, training, and evaluation (shared separately with submission).
2025_1_PROJECT_20/
βββ augmented_dataset/ # Holds post-processed augmented images
βββ augmented_images/ # Holds image augmentations applied
βββ data/blending/data/DermSynth3D/ # Blending-related data (unused in current pipeline)
βββ datasets/ # (Optional) legacy data holder
βββ dermDatabaseOfficial/ # Original derm database
β βββ release_v0/images/ # Raw derm image folders by class
βββ dermsynth3dimages/ # Older synthesis experiments
βββ synthetic_dataset/ # Output of generate_synthetic.py (new synthetic imgs)
βββ outputs/ # Results, logs, eval images, PCA plots
βββ src/
β βββ amazing/ # Placeholder module (template/example)
β βββ train_feature/
β β βββ generate_synthetic.py # Generates synthetic images for weak classes
β β βββ train_features.py # Trains classifier on extracted features
β β βββ train_combined_features.py # Trains classifier on real+synthetic features
β β βββ pca_visualization.py # Visualizes PCA clusters
β βββ augment.py # Legacy script for image augmentation
β βββ check_images.py # Utility to preview loaded images
β βββ evaluate_quality.py # Orchestrator to compute FID/SSIM
β βββ evaluate.py # Implements FID + SSIM logic
β βββ feature_extraction.py # Extracts ResNet18 features, saves CSV
β βββ process.py # Loads and previews raw images
β βββ visual_feature.py # (Optional) legacy visualizer
β βββ run.py # Runs entire pipeline from generation to evaluation
β βββ synthetic_features.csv # Feature CSV from synthetic images
β βββ extracted_features_with_labels.csv # Feature CSV from real images
βββ requirements.yml # Conda env definition
βββ LICENSE
βββ README.md # Current markdown project summary
git clone https://github.com/sfu-cmpt340/2025_1_project_20.git
cd 2025_1_project_20
# create a python virtual environment
python -m venv .venv
# activate the virtual environment
source .venv/bin/activate
All the required packages are listed in requirements.txt
. You can install them using pip:
# install the requirements
pip install -r requirements.txt
Ensure Python β₯ 3.10 and PyTorch β₯ 2.0.
cd src/
# [1] Generate synthetic images
python train_feature/generate_synthetic.py
# [2] Extract features
python feature_extraction.py
# [3] Train classifier
python train_feature/train_combined_features.py
# [4] Visualize PCA
python train_feature/pca_visualization.py
# [5] Evaluate image quality
python evaluate_quality.py
Or run everything at once:
python run.py
Experiment | Accuracy | FID Score |
---|---|---|
Real Only | ~76% | β |
Synthetic Only | ~90% | 131.19 |
Combined | 94.12% | β |
Observation: Synthetic images improved weak class performance and RandomForestClassifier gave solid performance for feature-based data
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
Fml | 1.00 | 0.94 | 0.97 | 31 |
Gbl | 0.97 | 1.00 | 0.98 | 28 |
Gcl | 0.95 | 0.97 | 0.96 | 38 |
Gdl | 0.90 | 0.92 | 0.91 | 38 |
NHL | 0.97 | 0.94 | 0.96 | 35 |
Accuracy | 0.95 | 170 | ||
Macro Avg | 0.96 | 0.95 | 0.96 | 170 |
Weighted Avg | 0.95 | 0.95 | 0.95 | 170 |
Observation: High accuracy across all classes, especially for synthetic images.
Observation: The confusion matrix highlights high classification accuracy across all classes, with minimal confusion between similar categories, confirming that synthetic data effectively improved model performance and addressed class imbalance.
Observation: The PCA plot reveals clear clustering for classes like βFmlβ, while overlap between βGblβ and βGdlβ, etc. suggests shared visual features or embeddings in the feature space.
Observation: The PCA plot shows strong overlap between real and synthetic data distributions, confirming that the synthetic images successfully mimic real feature characteristics, validating the augmentation strategy.
113.19
Observation: FID score β120β130 showed high-quality synthesis
SSIM was skipped due to dimensional mismatch errors. FID was the main quality metric.
- Try GANs or diffusion models for more realistic generation
- Use smart or conditional augmentations
- Extend to other medical imaging modalities
- Build a web-based tool for interactive class augmentation
- Fully working synthesis + training pipeline
- High accuracy with synthetic data
- Evaluated with PCA + FID
- One-click automation via
run.py
If using our work, please cite:
@misc{DermSynth3D,
author = {Aryaman Bahuguna, Gursewak Singh, Agraj Vuppula, Mohammed Ashraful Islam Bhuiyan, Mouryan Puri},
title = {DermSynth3D: Interactive Generation of Dermatology Images for Class Imbalance Mitigation},
year = {2025},
publisher = {GitHub},
url = {https://github.com/sfu-cmpt340/2025_1_project_20},
}
@misc{originalDermSynth3D,
author = {SFU Medical Image Analysis Lab (MIAL)},
title = {DermSynth3D: Synthetic Dermatology Image Generation},
year = {2023},
publisher = {GitHub},
url = {https://github.com/sfu-mial/DermSynth3D},
}
(for complete list of references, please refer to the project report)
-
Derm7pt Dataset
Kawahara, J., & Hamarneh, G. (2018). Derm7pt: A dermatology dataset for 7-point checklist disease detection. https://derm.cs.sfu.ca/ -
Synthetic Image Generation
Ghorbani, A. et al. (2020). DermGAN: Synthetic Generation of Clinical Skin Images with Pathology. NeurIPS ML4H Workshop. -
FreΜchet Inception Distance (FID)
Heusel, M. et al. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NeurIPS. -
Feature Extraction β ResNet-18
He, K. et al. (2016). Deep Residual Learning for Image Recognition. CVPR.