- Purpose: Reusing pre-trained models on large datasets (e.g., ImageNet) to accelerate training and improve performance on smaller, target datasets.
- Models: VGG, Inception, ResNet, MobileNet, DenseNet
- Benefits:
- Faster training
- Better performance with limited data
- Reduced computational overhead
- Purpose: Reducing model size and improving inference speed by converting high-precision floating-point weights to lower-precision formats (e.g., 16-bit, 8-bit).
- Techniques:
- Post-training quantization (PTQ)
- Quantization-aware training (QAT)
- Challenges: Potential loss of accuracy due to reduced dynamic range.
Knowledge distillation is a machine learning technique where a larger, more complex model (the teacher) transfers its knowledge to a smaller, simpler model (the student). This method is particularly useful for:
- Training the Teacher Model: A large model is trained on a dataset to achieve high accuracy.
- Generating Soft Targets: The teacher produces probability distributions over classes instead of hard labels, reflecting its confidence in each class.
- Training the Student Model: The student is trained using a combined loss function, incorporating traditional cross-entropy loss and a distillation loss that encourages it to replicate the soft targets from the teacher.
This approach allows the student model to maintain performance while being more efficient and easier to deploy.
- Purpose: Removing unnecessary parameters from the model to improve efficiency and reduce computational complexity.
- Methods: Removing entire feature maps from convolutional layers.
- Benefits:
- Faster inference
- Smaller model size
- Reduced memory footprint
- Framework: [PyTorch]
- Hardware: [Intel Xenon -8480 , Nvidia A6000]
- Optimization Tools: [Pytorch,ONNX]
You can see the realtime working implementaion on a FPGA here