This repository contains code for training and distillating (Response Based knwoledge Approach ) YOLOv8 models on Tuberculosis Image Dataset using in google colab.
- Fork/clone this repository to your GitHub account
- Upload your dataset and use
data_utils.py
to prepare the dataset in yolo format - Open a notebook in Google Colab
- Follow the steps in the notebook to:
- Clone your repository
- Install dependencies
- Train the model
- Save results
kd_yolo/
├── notebooks
│ ├── data_exploration.ipynb
│ └── model_performance.ipynb
├── train.py # Training script with argparse
├── utils/ # Utility functions
│ ├── __init__.py
│ ├── data_utils.py # Dataset preparation utilities
│ └── distill.py # Knowledge distillation functionality
├── requirements.txt # Dependencies
└── README.md # Documentation
The train.py
script accepts the following arguments:
--data-path
: Path to dataset directory (required)--model
: Model name or path (default: yolov8n.pt)--epochs
: Number of training epochs (default: 100)--batch-size
: Batch size (default: 16)--img-size
: Image size (default: 640)--device
: Device to use, either a CUDA device ID or "cpu" (default: 0)--project
: Project directory (default: runs/train)--name
: Experiment name (default: exp)--resume
: Resume training from last checkpoint
For training with knowledge distillation:
--distill
: Enable knowledge distillation (flag)--teacher-model
: Path to the teacher model (required for distillation)--temperature
: Temperature parameter for distillation (default: 4.0)--alpha
: Weight for balancing distillation and task loss (default: 0.5)
The dataset should be in YOLO format:
- images/ - Contains all images
- labels/ - Contains corresponding .txt label files
- data.yaml - Dataset configuration (will be created automatically if missing)
Knowledge distillation helps a smaller student model learn from a larger teacher model. This implementation:
- Uses response-based distillation focused on the output layer
- Applies temperature scaling to soften probability distributions
- Balances distillation loss with the original detection loss
Our knowledge distillation approach combines:
- Soft Target Loss: KL divergence between the teacher and student predictions
- Temperature scaling (T=4.0) softens probability distributions
- Higher temperature reveals more dark knowledge from teacher
- Hard Target Loss: Original task loss from labeled data
- Combined Loss:
total_loss = α * soft_target_loss + (1-α) * hard_target_loss
α
controls the balance between mimicking the teacher and learning from ground truth
For YOLOv8, we apply distillation to the detection outputs, helping the student model learn the nuanced prediction patterns of the larger teacher model.
python train.py --data-path /path/to/dataset --model yolov8s.pt \
--distill --teacher-model yolov8x.pt --temperature 4.0 --alpha 0.5 \
--epochs 100 --batch-size 16
This trains a YOLOv8-Small model (student) with knowledge from a YOLOv8-XLarge model (teacher).