Skip to content

AYUSH27112021/Optimizing-Deep-Learning-Efficiency

Repository files navigation

Different Techniques To Reduce Inference Time of Deep Learning Models

Techniques Implemented

Transfer Learning

  • Purpose: Reusing pre-trained models on large datasets (e.g., ImageNet) to accelerate training and improve performance on smaller, target datasets.
  • Models: VGG, Inception, ResNet, MobileNet, DenseNet
  • Benefits:
    • Faster training
    • Better performance with limited data
    • Reduced computational overhead

Quantization

  • Purpose: Reducing model size and improving inference speed by converting high-precision floating-point weights to lower-precision formats (e.g., 16-bit, 8-bit).
  • Techniques:
    • Post-training quantization (PTQ)
    • Quantization-aware training (QAT)
  • Challenges: Potential loss of accuracy due to reduced dynamic range.

Knowledge Distillation: Enhancing Model Efficiency

Knowledge distillation is a machine learning technique where a larger, more complex model (the teacher) transfers its knowledge to a smaller, simpler model (the student). This method is particularly useful for:

  • Training the Teacher Model: A large model is trained on a dataset to achieve high accuracy.
  • Generating Soft Targets: The teacher produces probability distributions over classes instead of hard labels, reflecting its confidence in each class.
  • Training the Student Model: The student is trained using a combined loss function, incorporating traditional cross-entropy loss and a distillation loss that encourages it to replicate the soft targets from the teacher.

This approach allows the student model to maintain performance while being more efficient and easier to deploy.

Pruning

  • Purpose: Removing unnecessary parameters from the model to improve efficiency and reduce computational complexity.
  • Methods: Removing entire feature maps from convolutional layers.
  • Benefits:
    • Faster inference
    • Smaller model size
    • Reduced memory footprint

Implementation Details

  • Framework: [PyTorch]
  • Hardware: [Intel Xenon -8480 , Nvidia A6000]
  • Optimization Tools: [Pytorch,ONNX]

Working Video

You can see the realtime working implementaion on a FPGA here

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published