Conflict With "On the Efficacy of Knowledge Distillation" Results #150
                  
                    
                      AhmedHussKhalifa
                    
                  
                
                  started this conversation in
                General
              
            Replies: 1 comment 6 replies
-
| Hi @AhmedHussKhalifa , Thank you for your interest in and question about torchdistill work! From the description, I think the choice of temperature and alpha matters in their settings and produced the different trend. Another possible factor is the number of GPUs (i.e., effective batch size and linear scaling rule of learning rate if distributed training). | 
Beta Was this translation helpful? Give feedback.
                  
                    6 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hey,
I want to thank you for this great work.
I went through your trained model of ImageNet by KD. The resent-18 trained by resnet-34 has a performance of 71.34%, which is really amazing. I found in "On the Efficacy of Knowledge Distillation" paper the same experiment with accuracy of 69.21% but with different hyperparameters as down mentioned. To the best of my knowledge, you have used different alfa (0.5) and temperature (1).
Do you think, is this the only reason for this huge difference?
Beta Was this translation helpful? Give feedback.
All reactions