Classifier with KL divergence (For Knowledge Destillation))
This code is used to train a classifier for knowledge destillation. If you have multiple scores of different classifiers you can perform knowledge destillation and use those scores as a teacher to train a smaller classifier that achieves a similar performance.