[Models] handle initialization of new layers in a partially pre-trained model better

If we do

```py
from diffusers import AutoModel 
import torch 

model = AutoModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev", subfolder="transformer", num_single_layers=40, torch_dtype=torch.bfloat16
).to("cuda")
```

It will result into

```bash
Traceback (most recent call last):
  File "/fsx/sayak/diffusers/check_sharded_model.py", line 6, in <module>
    ).to("cuda")
  File "/fsx/sayak/diffusers/src/diffusers/models/modeling_utils.py", line 1353, in to
    return super().to(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1336, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
```

@SunMarc since we discussed this in person.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Models] handle initialization of new layers in a partially pre-trained model better #11279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Models] handle initialization of new layers in a partially pre-trained model better #11279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions