You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Device placement is an experimental feature and the API may change. Only the `balanced` strategy is supported at the moment. We plan to support additional mapping strategies in the future.
65
65
66
-
The `device_map` parameter controls how the model components in a pipeline are distributed across devices. The `balanced` device placement strategy evenly splits the pipeline across all available devices.
66
+
The `device_map` parameter controls how the model components in a pipeline or the layers in an individual model are distributed across devices.
67
+
68
+
<hfoptionsid="device-map">
69
+
<hfoptionid="pipeline level">
70
+
71
+
The `balanced` device placement strategy evenly splits the pipeline across all available devices.
The `device_map` parameter also works on the model-level. This is useful for loading large models, such as the Flux diffusion transformer which has 12.5B parameters. Instead of `balanced`, set it to `"auto"` to automatically distribute a model across the fastest device first before moving to slower devices. Refer to the [Model sharding](../training/distributed_inference#model-sharding) docs for more details.
91
+
</hfoption>
92
+
<hfoptionid="model level">
93
+
94
+
The `device_map` is useful for loading large models, such as the Flux diffusion transformer which has 12.5B parameters. Set it to `"auto"` to automatically distribute a model across the fastest device first before moving to slower devices. Refer to the [Model sharding](../training/distributed_inference#model-sharding) docs for more details.
For more fine-grained control, pass a dictionary to enforce the maximum GPU memory to use on each device. If a device is not in `max_memory`, it is ignored and pipeline components won't be distributed to it.
108
+
You can inspect a model's device map with `hf_device_map`.
109
+
110
+
```py
111
+
print(transformer.hf_device_map)
112
+
```
113
+
114
+
</hfoption>
115
+
</hfoptions>
116
+
117
+
When designing your own `device_map`, it should be a dictionary of a model's specific module name or layer and a device identifier (an integer for GPUs, `cpu` for CPUs, and `disk` for disk).
118
+
119
+
Call `hf_device_map` on a model to see how model layers are distributed and then design your own.
Pass a dictionary mapping maximum memory usage to each device to enforce a limit. If a device is not in `max_memory`, it is ignored and pipeline components won't be distributed to it.
0 commit comments