Help with `pjit` merge: `jit does not support using the mesh context manager`? #15374

kavorite · 2023-04-03T21:27:51Z

kavorite
Apr 3, 2023

Hi :) I'm really excited to try sharding weights across devices! This will help me make much better use of access to TPUv2-8 VMs. I found this issue that seems really similar to the one that I'm having, but it's been closed. I've been struggling to port this code to work with stable diffusion. I'll try to illustrate what happens when I try to use the pjit merge to factor out the pmap decorating my train_step. If I strip out some initialization and optimization code, what I end up with is:

# shard.py
import jax
import numpy as np


params = np.zeros([1, jax.local_device_count()])
mesh = jax.sharding.Mesh(
    np.array(jax.devices()).reshape(1, -1),
    axis_names=("batch", "model"),
)
param_spec = jax.sharding.NamedSharding(mesh, jax.sharding.PartitionSpec(None, "model"))
jax.device_put(params, param_spec)  # works !!!
jax.jit(lambda: params, in_shardings=None, out_shardings=param_spec)()  # crashes ..?

The code fails with a RuntimeError: jit does not support using the mesh context manager and passing PartitionSpecs to in_shardings or out_shardings. Please pass in the Sharding explicitly via in_shardings or out_shardings. This is puzzling, considering that I created a Mesh object, but didn't use it as a context manager, and did exactly as instructed by the error, explicitly specifying NamedShardings for each leaf of the parameter tree. It seems to work when I use it explicitly for device placement. What's actually going on here?

Edit: made the 'minimal example' more minimal 😅
Update: Tried with a PositionalSharding instead, that crashed with an identical trace

kavorite · 2023-04-04T04:00:48Z

kavorite
Apr 4, 2023
Author

I'm going to use jax.lax.with_sharding_constraint on my parameters under the jit after placing them on the devices explicitly, a la the tutorial here. Maybe that'll work

0 replies

skye · 2023-04-05T18:34:14Z

skye
Apr 5, 2023
Maintainer

You work around the original jit does not support using the mesh context manager error by removing the in_shardings=None argument altogether:

jax.jit(lambda: params, out_shardings=param_spec)()  # no crash

We're working on improving the error message and possibly changing the API to accept None here. In the long run, we suggest not using in_shardings though, since the input arguments to the jit'd function should be sufficient (or in this case there are no input arguments at all so still no need for in_shardings). We may remove in_shardings altogether in the future; it's mostly there to help transition towards the new Array + jit-based workflows from prior pjit workflows.

cc @yashk2810

1 reply

kavorite Apr 6, 2023
Author

For some reason leaving in_shardings unspecified did not work in my actual application case, I'm having trouble isolating the problem. Explicit device placement is fine for initialization purposes because the model easily fits in host memory, so I can just bounce it there and then shard it as necessary-- I'm also using it as a diagnostic to make sure my partition specs are actually valid. Jit/no jit is pretty inconsequential here, since I'm just loading a checkpoint: This script is intended as a mock-up to try to feel out the new API surface and see if I can get something with a signature like train_state -> train_state to compile properly before attempting it in my actual training harness, since I was having difficulties during integration of pjit functionality. I was unsuccessful in getting it to run. Full code:

"Adapted from https://github.com/huggingface/transformers/blob/main/examples/research_projects/jax-projects/model_parallel/partitions.py"
import jax
import jax.tree_util as jtu
import numpy as np
import optax
from typing import Callable
from functools import partial


def major_axis(w, min_width=1):
    rank = sum(1 for d in w.shape if d > min_width)
    if rank > 1:
        axis = max(range(w.ndim), key=w.shape.__getitem__)
        return axis


def spec(
    mesh: jax.sharding.Mesh,
    params: optax.Params,
    axis_fn: Callable[[jax.Array], int] = partial(major_axis, min_width=768),
):
    def pspec(w):
        if isinstance(w, jax.Array):
            axis = axis_fn(w)
            if axis is not None:
                spec = [
                    mesh.axis_names[-1] if i == axis else None for i in range(w.ndim)
                ]
                spec = jax.sharding.PartitionSpec(*spec)
            else:
                # spec = [None] * w.ndim
                spec = None
            # spec = jax.sharding.PartitionSpec(*spec)
            return jax.sharding.NamedSharding(mesh, spec)

    return jtu.tree_map(pspec, params)


if __name__ == "__main__":
    jax.devices()  # TPUv2 VMs crash if I don't call this before doing computations
    from typing import NamedTuple
    import numpy as np
    import functools as ft
    from diffusers import FlaxStableDiffusionPipeline

    module, params = FlaxStableDiffusionPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5", safety_checker=None, from_pt=True
    )  # loads the scheduler, the textual encoder, UNet, and VAE
    # params, param_structure = jtu.tree_flatten(jax.device_get(params))
    shards = np.array(jax.devices()).reshape(2, -1)
    mesh = jax.sharding.Mesh(shards, axis_names=("batch", "model"))
    param_spec = spec(mesh, params, axis_fn=ft.partial(major_axis, min_width=768))

    def optimizer():
        return optax.trace(0.0)  # stateless placeholder

    class TrainState(NamedTuple):
        params: optax.Params
        opt_st: optax.OptState

    def train_init(params):
        opt_st = optimizer().init(params)
        return TrainState(params, opt_st)

    param_shape = jtu.tree_map(np.shape, params)
    state_shape = jax.eval_shape(train_init, param_shape)
    state_spec = jtu.tree_map(
        lambda x: param_spec if isinstance(x, dict) else None,
        state_shape,
        is_leaf=lambda x: isinstance(x, (dict, optax.EmptyState)),
    )
    tstate = jax.device_get(train_init(params))
    tstate = jax.device_put(tstate, state_spec)
    # TODO: figure out why we can't initialize parameters under pjit like this
    p_train_init = jax.jit(
        train_init,
        out_shardings=state_spec,
    )
    p_train_init(params)
    pass

Quasar-Kim · 2023-09-14T06:12:53Z

Quasar-Kim
Sep 14, 2023

Any update on this? The documentation says jax.jit is equivalent to pjit but it's misleading because jit lacks support for mesh context manager. Being able to separate mesh from the jitted function is (IMHO) a big deal - passing the mesh object all the way down to the code where jax.lax.with_sharding_constraint is used is cumbursome.

1 reply

yashk2810 Sep 14, 2023
Collaborator

jax.jit is not going to support the mesh context manager.

passing the mesh object all the way down to the code where jax.lax.with_sharding_constraint is used is cumbursome.

Yes, we are working on fixing it. But as another solution, you can create your own context manager to thread mesh around :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help with `pjit` merge: `jit does not support using the mesh context manager`? #15374

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Help with pjit merge: jit does not support using the mesh context manager? #15374

Uh oh!

Uh oh!

kavorite Apr 3, 2023

Replies: 3 comments · 2 replies

Uh oh!

kavorite Apr 4, 2023 Author

Uh oh!

skye Apr 5, 2023 Maintainer

Uh oh!

Uh oh!

kavorite Apr 6, 2023 Author

Uh oh!

Uh oh!

Quasar-Kim Sep 14, 2023

Uh oh!

yashk2810 Sep 14, 2023 Collaborator

Help with `pjit` merge: `jit does not support using the mesh context manager`? #15374

kavorite
Apr 3, 2023

Replies: 3 comments 2 replies

kavorite
Apr 4, 2023
Author

skye
Apr 5, 2023
Maintainer

kavorite Apr 6, 2023
Author

Quasar-Kim
Sep 14, 2023

yashk2810 Sep 14, 2023
Collaborator