Question: Can this work for non-one hot labels?

Many of the proofs assume that the label `y` is an one-hot vector. However, in situations such as Virtual Adversarial Training, we would like to find a perturbation that maximizes the KL-divergence between the class distribution for input and the perturbed input. So it could also be seen as a non-one-hot vector for `y`. In such cases, could we still find an upper-bound of the loss?

I personally think we could indeed encode the KL-divergence as Eq. (4) by choosing an appropriate `c`. However, I find it hard to prove the equivalent of Theorem 2 under the assumption that `y` could be non-one-hot. I am wondering if anyone has already proved or disproved this?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question: Can this work for non-one hot labels? #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question: Can this work for non-one hot labels? #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions