Handling random keys with higher order derivatives and a stochastic custom_vjp #18085

atiyo · 2023-10-12T19:57:46Z

atiyo
Oct 12, 2023

I have a stochastic function f. I can calculate (stochastic) estimates of the gradients of f and I'd like to use higher-order derivatives of f as a part of a wider computational graph.

I can get this working nicely with a first-order derivative with a custom_vjp, but not with higher-order derivatives.

Custom JVP implementation

I have something like this, which works for first order derivatives.

@jax.custom_vjp
def f(fwd_key,...):
    return foo

def grad_f(grad_key, ...):
    return bar

def f_fwd(key,...):
    fwd_key, grad_key = random.split(key)
    return f(fwd_key,...), grad_f(grad_key,...)

def f_bwd(res, g):
    return baz
  
f.defvjp(f_fwd, f_bwd)

key = random.PRNGkey(0)
jax.grad(f)(key, ...) # works like a dream

The problem

Using this custom_vjp for a second-order derivative will be reusing the random key in each backward pass. To simulate randomness properly, we'd want different keys to be used in each backward pass.

key = random.PRNGkey(0)
jax.grad(jax.grad(f))(key, ...) # will be reusing the same key in each backward pass.

A working example

import jax
from jax import random


@jax.custom_vjp
def f(fwd_key, x):
    """A sample from a distribution of quadratic functions."""
    noise = random.uniform(fwd_key, (1,))[0]
    noisy_x = x * noise
    return noisy_x + noisy_x ** 2


def grad_f(grad_key, x):
    """A sample of a noisy gradient from the noisy quadratic functions.
    Note, in particular, that the derivative of the function is desired to be
    from a different sample, not the same as the the one used in the forward
    pass."""
    noise = random.uniform(grad_key, (1,))[0]
    noisy_x = x * noise
    return 1.0 + 2 * noisy_x


def f_fwd(key, x):
    fwd_key, grad_key = random.split(key)
    return f(fwd_key, x), grad_f(grad_key, x)


def f_bwd(res, g):
    return None, res * g


f.defvjp(f_fwd, f_bwd)

key = random.PRNGKey(0)
jax.value_and_grad(f, argnums=1)(key, 1.0)  # 0.86768 and 1.2107: Good! These are uncorrelated :) 
jax.value_and_grad(jax.grad(f, argnums=1), argnums=1)( key, 1.0)  # 1.2107 and 0.2107: Bad, these are correlated :(

Note that the forward pass and first-order gradient calculations are uncorrelated (penultimate line of the code above): this is what I want to achieve for all order gradients.

But the second-order gradient and first order gradients are directly correlated (last line of the code above): I do not want this, I want another uncorrelated sample of a second-order gradient.

This is because the same key is being reused in the calculation of the first order and second order gradients. So I'd love help with finding a way for nested grad calls to be able to use different random keys.

An undesirable workaround

Maintain random keys globally and access these from within f_fwd. Apart from being bad practice, it also prevents jit compilation, so is really detrimental to overall performance.

Question

Is there any way to deal use custom_vjps needing random keys for higher order derivatives with random in the manner above without resorting to global random keys?

Answered by patrick-kidger

Oct 15, 2023

I think what you'll need to do to accomplish this is put another custom_vjp inside your existing custom_vjp, giving the desired behavior (pass in a key and use this to generate uncorrelated second-order gradients).

For arbitrary-order gradients, you'd want to set up some kind of recursive procedure.

If it is easier for you to work through the mathematics, note that you can use a custom_jvp instead of a custom_vjp here. JAX will automatically synthesise the VJP from the JVP (deterministically and via transposition).

This aside, I'm quite curious what you're up to, that needs uncorrelated gradients? Essentially every modern use-case I know of (GANs, differentiating through SDE solves, ...) …

View full answer

jakevdp · 2023-10-12T20:19:32Z

jakevdp
Oct 12, 2023
Maintainer

Could you edit your example code to something that is executable, rather than incomplete pseudocode, and then show the output of the second order function and how it's different than what you expect it to be? I'm having trouble filling in the missing pieces between your description and your pseudocode.

1 reply

atiyo Oct 13, 2023
Author

Hey @jakevdp. Thanks very much for your time!

I've edited the original post to include a MWE. I hope it's clearer now.

patrick-kidger · 2023-10-15T21:02:21Z

patrick-kidger
Oct 15, 2023

I think what you'll need to do to accomplish this is put another custom_vjp inside your existing custom_vjp, giving the desired behavior (pass in a key and use this to generate uncorrelated second-order gradients).

For arbitrary-order gradients, you'd want to set up some kind of recursive procedure.

If it is easier for you to work through the mathematics, note that you can use a custom_jvp instead of a custom_vjp here. JAX will automatically synthesise the VJP from the JVP (deterministically and via transposition).

This aside, I'm quite curious what you're up to, that needs uncorrelated gradients? Essentially every modern use-case I know of (GANs, differentiating through SDE solves, ...) prefers correlated gradients. Off the top of my head the only case where I've seen uncorrelated gradients is Malliavin calculus, which AFAIK is largely superseded by the correlated-gradient approach wherever that's possible.

2 replies

atiyo Oct 17, 2023
Author

Ah, that's clever! I noticed something similar is done for handling higher-order derivatives of sinc. Thanks very much!

The need for uncorrelated gradients isn't down to a theoretical need, rather than a practical one. I'm simulating a system where I have access to noisy uncorrelated gradients, so I wanted to mimic the handling of that system as closely as possible via a surrogate.

Something like this seems to be working:

import jax
from jax import random
import functools


def f(x, key):
    return grad_f(key=key, x=x, order=0)


@functools.partial(jax.custom_jvp, nondiff_argnums=(1, 2))
def grad_f(x, key, order):
    """Can take a forward pass by taking order=0."""
    key = random.split(key, num=order+1)[-1]
    noise = random.uniform(key, (1,))[0]
    noisy_x = x * noise
    if order > 2:
        return 0.0
    if order == 2:
        return 2.0 * noisy_x
    if order == 1:
        return 1.0 + 2 * noisy_x
    if order == 0:
        return noisy_x + noisy_x**2


@grad_f.defjvp
def grad_f_jvp(key, order, primals, tangents):
    x = primals[0]
    t = tangents[0]
    return grad_f(x, key, order), t * grad_f(x, key, order+1)


key = random.PRNGKey(0)
jax.value_and_grad(f)(1.0, key)  # 0.087, 1.21
jax.value_and_grad(jax.grad(f))(1.0, key)  # 1.21, 1.61
jax.value_and_grad(jax.grad(jax.grad(f)))(1.0, key)  # 1.61, 0.0

Thanks very much for your help!

patrick-kidger Oct 17, 2023

Aha, got it!
No problem, glad it's working for you now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling random keys with higher order derivatives and a stochastic custom_vjp #18085

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Handling random keys with higher order derivatives and a stochastic custom_vjp #18085

Uh oh!

Uh oh!

atiyo Oct 12, 2023

Custom JVP implementation

The problem

A working example

An undesirable workaround

Question

Replies: 2 comments · 3 replies

Uh oh!

jakevdp Oct 12, 2023 Maintainer

Uh oh!

Uh oh!

atiyo Oct 13, 2023 Author

Uh oh!

patrick-kidger Oct 15, 2023

Uh oh!

Uh oh!

atiyo Oct 17, 2023 Author

Uh oh!

patrick-kidger Oct 17, 2023

atiyo
Oct 12, 2023

Replies: 2 comments 3 replies

jakevdp
Oct 12, 2023
Maintainer

atiyo Oct 13, 2023
Author

patrick-kidger
Oct 15, 2023

atiyo Oct 17, 2023
Author