-
Couldn't load subscription status.
- Fork 84
Description
Many of the proofs assume that the label y is an one-hot vector. However, in situations such as Virtual Adversarial Training, we would like to find a perturbation that maximizes the KL-divergence between the class distribution for input and the perturbed input. So it could also be seen as a non-one-hot vector for y. In such cases, could we still find an upper-bound of the loss?
I personally think we could indeed encode the KL-divergence as Eq. (4) by choosing an appropriate c. However, I find it hard to prove the equivalent of Theorem 2 under the assumption that y could be non-one-hot. I am wondering if anyone has already proved or disproved this?
Thanks!