Taking Derivatives of Gradients w.r.t to intermediate variables? #18383

karan-dalal · 2023-11-03T21:49:37Z

karan-dalal
Nov 3, 2023

I have the following function, and somehow want to get dG/dx_hat, where G is the gradient of loss with respect to W. Assume A is some linear transformation matrix.

def f(x, W):
    x_hat = (x @ W) @ A
    loss = MSE(x, x_hat)
    return loss

loss_fn = grad(loss, argnums=1)
G = loss_fn(x, W)
// How to get dG/dx_hat?

(Note this is a simplified project scope). I've tried reorganizing the computation as a VJP of a VJP (#18276), but that doesn't seem to yield the correct result. Could I try writing a function that returns G, then perturbing the x_hat? Any alternative suggestions?

Thank you!

jakevdp · 2023-11-03T21:56:45Z

jakevdp
Nov 3, 2023
Maintainer

You can only take derivatives with respect to function arguments.

One way you can make this work is using the approach in #5336 (comment); i.e track a perterbation to your original variable; for example:

def f(x, W, dx):
    x_hat = (x @ W) @ A
    loss = MSE(x, x_hat)
    return x_hat + dx

loss_fn = grad(loss, argnums=2)
G = loss_fn(x, W, 0.0)

The answer in your example is trivial, though, because G is identical to x_hat, so dG/dx_hat = 1.0

5 replies

karan-dalal Nov 3, 2023
Author

Ah, good catch. Typo in my example, the function should return loss, not x_hat. Will this still work, if G is actually equal to d loss / d W, and I want to perturb the function to get d G / d x_hat?

jakevdp Nov 3, 2023
Maintainer

Sure, I think if you do this:

def f(x, W, dx):
    x_hat = (x @ W) @ A
    loss = MSE(x, x_hat + dx)
    return loss

loss_fn = grad(loss, argnums=2)
G = loss_fn(x, W, 0.0)

Then G will effectively give you the gradient of the los with respect to x_hat. Since x_hat is a vector here, you may want to pass a matching vector for dx.

karan-dalal Nov 3, 2023
Author

I don't want G to be the gradient of loss with respect to x_hat, but actually the derivative of the (gradient of loss with respect to W) with respect to x_hat. Could something like this work:

def f(x, W, dx):
    x_hat = (x @ W) @ A
    loss = MSE(x, x_hat + dx)
    return loss

G = grad(loss, argnums=1)
dG_dx_hat = grad(G(x, W, 0.0), argnums = 2)

jakevdp Nov 3, 2023
Maintainer

Yes, I think that's the way to take the mixed second derivate with respect to W and x_hat.

froystig Nov 3, 2023
Maintainer

This also seems like a duplicate of #18199 (answered similarly then).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Taking Derivatives of Gradients w.r.t to intermediate variables? #18383

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Taking Derivatives of Gradients w.r.t to intermediate variables? #18383

Uh oh!

Uh oh!

karan-dalal Nov 3, 2023

Replies: 1 comment · 5 replies

Uh oh!

jakevdp Nov 3, 2023 Maintainer

Uh oh!

karan-dalal Nov 3, 2023 Author

Uh oh!

Uh oh!

jakevdp Nov 3, 2023 Maintainer

Uh oh!

karan-dalal Nov 3, 2023 Author

Uh oh!

jakevdp Nov 3, 2023 Maintainer

Uh oh!

froystig Nov 3, 2023 Maintainer

karan-dalal
Nov 3, 2023

Replies: 1 comment 5 replies

jakevdp
Nov 3, 2023
Maintainer

karan-dalal Nov 3, 2023
Author

jakevdp Nov 3, 2023
Maintainer

karan-dalal Nov 3, 2023
Author

jakevdp Nov 3, 2023
Maintainer

froystig Nov 3, 2023
Maintainer