Skip to content

[Models] handle initialization of new layers in a partially pre-trained model better #11279

@sayakpaul

Description

@sayakpaul

If we do

from diffusers import AutoModel 
import torch 

model = AutoModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev", subfolder="transformer", num_single_layers=40, torch_dtype=torch.bfloat16
).to("cuda")

It will result into

Traceback (most recent call last):
  File "/fsx/sayak/diffusers/check_sharded_model.py", line 6, in <module>
    ).to("cuda")
  File "/fsx/sayak/diffusers/src/diffusers/models/modeling_utils.py", line 1353, in to
    return super().to(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1336, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

@SunMarc since we discussed this in person.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions