Skip to content

Documentation of Loss.get_grad confusingly describes independent variable #901

@connorbrinton

Description

@connorbrinton

How to reproduce the behaviour

Hi there! I was recently working on implementing a custom loss function (binary focal loss) and found some of the documentation to be a bit confusing. The documentation of Loss.get_grad states that it should:

Calculate the gradient of the loss with respect with the model outputs.

However, looking at the implementation of some of Thinc's built-in loss functions, Loss.get_grad actually calculates the gradient of the loss with respect to the logits used as input to the preceding softmax/sigmoid layer.

For example, the CategoricalCrossentropy loss class computes the gradient as guesses - target. This is off from the derivative of the loss wrt the model outputs (probabilities) by a factor of 1 / (p * (1 - p)). This factor is cancelled out by the derivative of the logistic function in the derivative of the loss wrt to the logits, giving the derivative wrt to the logits as guesses - targets.

This whole setup works because the softmax part of the softmax layer uses the identity function as its backwards pass. This ends up making the forwards and backwards passes of the softmax layer inconsistent, but in theory everything balances out.

I assume that this setup was selected to help improve numerical stability. The focal loss paper actually mentions this explicitly:

we note that the implementation of the loss layer combines the sigmoid operation for computing p with the loss computation, resulting in greater numerical stability.

Anyways, the point of this issue is that the current documentation of Loss.get_grad is confusing, since the gradient is actually computed with respect to the logits, and not the model outputs, even though the model outputs are what is provided to the method. It would be great to have this clarified in the documentation 🙂

Thanks for maintaining Thinc! 😄

Your Environment

  • Operating System: macOS 13.5
  • Python Version Used: 3.9.16
  • Thinc Version Used: 8.1.10
  • Environment Information: Poetry virtual environment, M1 mac

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions