Strategy for jit-compiling functions with non-pytree arguments #14689

dfdx · 2023-02-26T16:14:53Z

dfdx
Feb 26, 2023

Consider this example function inspired by Huggingface's implementation of Dreambooth.

def train_step(model1, model2, model3, model1_params, model2_params, model3_params):
    def loss_fn(params):
        ...
    ...

modelX arguments are not Pytree nodes, and so jax.jit(train_step) fails to compile it. How do I overcome it?

Here's a couple things I've considered.

In the Huggingface implementation, they solve this issue by making train_step an internal function and referencing outer function scope, i.e.:

def main():
    model1 = ...
    model2 = ...
    model3 = ...
    ...
    def train_step(model1_params, model2_params, model3_params):
        ...

But this results in a pretty long function definition which is hard to read and modify.

The Flax's Quick Start suggests a more structured approach using TrainState, when all parameters are collected into state.params and all logic goes into state.apply_fn. But in my case, it's not obvious how to make a proper apply_fn too since it combines several models with their own non-trivial logic.

Are there any other strategies for jit-compiling functions with non-pytree arguments?

Answered by mattjj

Feb 26, 2023

Thanks for the question!

Two ideas:

If the model objects are hashable, use jit's static_argnums (and static_argnames if you prefer).
If they're not hashable, use functools.partial.

The latter has the same behavior as the lexical closure approach used by HF. The difference with the former is requiring hashability (with the hashability providing more opportunities for cache hits when models compare equal).

In more detail, the first option might look like

def train_step(model1, model2, model3, model1_params, model2_params, model3_params):
    def loss_fn(params):
        ...
    ...

train_step = jax.jit(train_step, static_argnums=(0, 1, 2))

# a call site might look something like this
tr…

View full answer

mattjj · 2023-02-26T19:08:12Z

mattjj
Feb 26, 2023
Maintainer

Thanks for the question!

Two ideas:

If the model objects are hashable, use jit's static_argnums (and static_argnames if you prefer).
If they're not hashable, use functools.partial.

The latter has the same behavior as the lexical closure approach used by HF. The difference with the former is requiring hashability (with the hashability providing more opportunities for cache hits when models compare equal).

In more detail, the first option might look like

def train_step(model1, model2, model3, model1_params, model2_params, model3_params):
    def loss_fn(params):
        ...
    ...

train_step = jax.jit(train_step, static_argnums=(0, 1, 2))

# a call site might look something like this
train_step(model1, model2, model3, model1_params, model2_params, model3_params)

The second option might look like

from functools import partial

def train_step(model1, model2, model3, model1_params, model2_params, model3_params):
    def loss_fn(params):
        ...
    ...

train_step = jax.jit(partial(train_step, model1, model2, model3))
# or even, without needing to import `partial`
train_step = jax.jit(lambda *args: train_step(model1, model2, model3, *args))

# a call site might look something like this
train_step(model1_params, model2_params, model3_params)

What do you think?

0 replies

dfdx · 2023-02-26T22:56:08Z

dfdx
Feb 26, 2023
Author

Indeed, the partial trick solves the issue. Here's a complete example:

import jax
import jax.numpy as jnp
import flax.linen as nn
from functools import partial


class MLP(nn.Module):

    @nn.compact
    def __call__(self, x):
        x = nn.Dense(features=10)(x)
        return x

def train_step(model, model_params, x):
    return model.apply({"params": model_params}, x)


def main():
    rng = jax.random.PRNGKey(0)
    model = MLP()
    model_params = model.init(rng, jnp.ones([1, 64]))['params']
    x = jax.random.normal(rng, (1, 64))
    train_step(model, model_params, x)
    partial_train_step = jax.jit(partial(train_step, model))
    partial_train_step(model_params, x)

Thank you for the help!

1 reply

mattjj Feb 27, 2023
Maintainer

No problem, glad to hear it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strategy for jit-compiling functions with non-pytree arguments #14689

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Strategy for jit-compiling functions with non-pytree arguments #14689

Uh oh!

dfdx Feb 26, 2023

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

mattjj Feb 26, 2023 Maintainer

Uh oh!

dfdx Feb 26, 2023 Author

Uh oh!

mattjj Feb 27, 2023 Maintainer

dfdx
Feb 26, 2023

Replies: 2 comments 1 reply

mattjj
Feb 26, 2023
Maintainer

dfdx
Feb 26, 2023
Author

mattjj Feb 27, 2023
Maintainer